The goal of this Notebook is to take the parts of NovemberPhyloProc.ipynb that I expect to go into the actual manuscript.
Figure 1. Antibodies over time.
Figure 2. Weighted unifrac PCoA
Table 1. Kernel regression and weighted unifrac 1 test - binomial
Table S1. Kernel regression and weighted unifrac 1 test - gaussian
Table S2. Weighted unifrac components 2-N
Figure S1. Jensen Shannon at different agglomeration levels.
Figure S2. Statistically significant (p< 0.05, q < 0.2) family genus and species abundances (clr transformed) regressed against weighted unifrac 1.
Table S3. All family - genus and species vs antibody vs glm scores.
Figure 3. Stacked bars of key groups ordered by weighted unifrac axis 1.
Figure S3. Groups associated with IgGs.
Figure 4. Proportionality heat-map
3/20/2017 Re-ran whole pipeline end-to-end. Hits on everything (including igg gp41 0 day) except only trending on gp41 Month 6.5. However I don’t see family level groups for gp41 that relate to community structure (q < 0.2, p < 0.05). I may just try running everything end to end a few more times to see what happens. This because I want to see how consistant (or otherwise) the results are between runs. Also, I’m going to start setting seeds now.
# only use library paths in the anaconda environment
#.libPaths(grep('anaconda3', .libPaths(), value = T))
.libPaths()
[1] "/home/jacob/Projects/Nyvac_096_Microbiome/packrat/lib/x86_64-pc-linux-gnu/3.6.1"
[2] "/home/jacob/Projects/Nyvac_096_Microbiome/packrat/lib-ext"
[3] "/home/jacob/Projects/Nyvac_096_Microbiome/packrat/lib-R"
R.version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 6.1
year 2019
month 07
day 05
svn rev 76782
language R
version.string R version 3.6.1 (2019-07-05)
nickname Action of the Toes
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] phyloseq_1.22.3
loaded via a namespace (and not attached):
[1] nlme_3.1-137 bitops_1.0-6 matrixStats_0.54.0 RColorBrewer_1.1-2
[5] rprojroot_1.3-2 GenomeInfoDb_1.14.0 tools_3.6.1 backports_1.1.2
[9] R6_2.3.0 vegan_2.5-3 lazyeval_0.2.1 BiocGenerics_0.24.0
[13] mgcv_1.8-25 colorspace_1.3-2 permute_0.9-4 ade4_1.7-13
[17] tidyselect_0.2.5 gridExtra_2.3 phangorn_2.4.0 compiler_3.6.1
[21] Biobase_2.38.0 DelayedArray_0.4.1 scales_1.0.0 quadprog_1.5-5
[25] stringr_1.3.1 digest_0.6.18 Rsamtools_1.30.0 rmarkdown_1.10
[29] dada2_1.6.0 XVector_0.18.0 base64enc_0.1-3 pkgconfig_2.0.2
[33] htmltools_0.3.6 rlang_0.4.0 rstudioapi_0.8 bindr_0.1.1
[37] generics_0.0.2 hwriter_1.3.2 jsonlite_1.5 BiocParallel_1.12.0
[41] dplyr_0.7.7 RCurl_1.95-4.11 magrittr_1.5 GenomeInfoDbData_1.0.0
[45] biomformat_1.6.0 Matrix_1.2-15 Rcpp_0.12.19 munsell_0.5.0
[49] S4Vectors_0.16.0 ape_5.2 viridis_0.5.1 furrr_0.1.0
[53] stringi_1.2.4 yaml_2.2.0 MASS_7.3-51.1 SummarizedExperiment_1.8.1
[57] zlibbioc_1.24.0 rhdf5_2.22.0 plyr_1.8.4 grid_3.6.1
[61] parallel_3.6.1 listenv_0.7.0 crayon_1.3.4 lattice_0.20-38
[65] cowplot_0.9.3 Biostrings_2.46.0 splines_3.6.1 multtest_2.34.0
[69] knitr_1.20 pillar_1.3.0 igraph_1.2.2 GenomicRanges_1.30.3
[73] reshape2_1.4.3 codetools_0.2-15 stats4_3.6.1 fastmatch_1.1-0
[77] glue_1.3.0 packrat_0.4.8-1 evaluate_0.12 ShortRead_1.36.1
[81] rsample_0.0.5 latticeExtra_0.6-28 data.table_1.11.8 RcppParallel_4.4.1
[85] modelr_0.1.2 foreach_1.4.4 gtable_0.2.0 purrr_0.2.5
[89] tidyr_0.8.2 future_1.14.0 assertthat_0.2.0 ggplot2_3.1.0
[93] broom_0.5.0 viridisLite_0.3.0 survival_2.43-1 tibble_1.4.2
[97] iterators_1.0.10 GenomicAlignments_1.14.2 IRanges_2.12.0 bindrcpp_0.2.2
[101] cluster_2.0.7-1 globals_0.12.4
# https://stackoverflow.com/questions/46354826/have-a-function-that-calls-library-and-takes-either-a-package-or-its-name-as-inp
# Also return package version when loading in packages
# accept strings or functions
libver <- function(pac){
pac <- as.character(substitute(pac))
library(pac, character.only=TRUE)
packageVersion(pac)
}
#libver("dada2")
#libver("ggplot2")
libver("Cairo")
[1] ‘1.5.9’
# Much of the data handling
libver('phyloseq')
[1] ‘1.22.3’
# A bunch of environments, including ggplot, dplyr, tidyr, and broom, which I use a lot
libver('tidyverse')
Registered S3 method overwritten by 'rvest':
method from
read_xml.response xml2
[30m── [1mAttaching packages[22m ────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──[39m
[30m[32m✔[30m [34mggplot2[30m 3.1.0 [32m✔[30m [34mpurrr [30m 0.2.5
[32m✔[30m [34mtibble [30m 1.4.2 [32m✔[30m [34mdplyr [30m 0.7.7
[32m✔[30m [34mtidyr [30m 0.8.2 [32m✔[30m [34mstringr[30m 1.3.1
[32m✔[30m [34mreadr [30m 1.1.1 [32m✔[30m [34mforcats[30m 0.3.0[39m
[30m── [1mConflicts[22m ───────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[30m [34mdplyr[30m::[32mfilter()[30m masks [34mstats[30m::filter()
[31m✖[30m [34mdplyr[30m::[32mlag()[30m masks [34mstats[30m::lag()[39m
[1] ‘1.2.1’
# Mostly for concatenating ggplots
library(gridExtra); packageVersion("gridExtra")
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
[1] ‘2.3’
# I use this surprisingly not a lot here.
library(vegan); packageVersion("vegan")
Loading required package: permute
Loading required package: lattice
This is vegan 2.5-3
[1] ‘2.5.3’
# For making trees
# libver('phangorn')
# A prerequesite to phangorn
# libver("DECIPHER")
# Some pre-processing stuff
# libver("dada2")
# I usually reshape with tidyverse tools now, but melt and cast are often easier in a pinch
# libver("reshape2")
# For replacing NaNs without too much thought.
# libver("imputeMissings")
# Deal with proportional data, especially useful for calculating proportionality phi
libver('compositions')
Loading required package: tensorA
Attaching package: ‘tensorA’
The following object is masked from ‘package:base’:
norm
Loading required package: robustbase
Loading required package: energy
Loading required package: bayesm
Welcome to compositions, a package for compositional data analysis.
Find an intro with "? compositions"
Attaching package: ‘compositions’
The following objects are masked from ‘package:stats’:
cor, cov, dist, var
The following objects are masked from ‘package:base’:
%*%, scale, scale.default
[1] ‘1.40.2’
# Works with tidyverse to make model output tidy
libver('broom')
[1] ‘0.5.0’
# Make pretty tables
libver('knitr')
[1] ‘1.20’
libver('kableExtra')
[1] ‘0.9.0’
# Let those pretty tables actually show up in a jupyter notebook
#library('IRdisplay')
# For bootstrapping
libver('boot')
Attaching package: ‘boot’
The following object is masked from ‘package:robustbase’:
salinity
The following object is masked from ‘package:lattice’:
melanoma
[1] ‘1.3.20’
# Calculate kernel regressions
libver("MiRKAT")
Loading required package: survival
Attaching package: ‘survival’
The following object is masked from ‘package:boot’:
aml
The following object is masked from ‘package:robustbase’:
heart
Loading required package: PearsonDS
Loading required package: GUniFrac
Loading required package: ape
Attaching package: ‘ape’
The following object is masked from ‘package:compositions’:
balance
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:robustbase’:
colMedians, rowMedians
The following object is masked from ‘package:dplyr’:
count
Loading required package: MASS
Attaching package: ‘MASS’
The following object is masked from ‘package:dplyr’:
select
[1] ‘1.0.1’
libver("car")
Loading required package: carData
Attaching package: ‘car’
The following object is masked from ‘package:boot’:
logit
The following object is masked from ‘package:dplyr’:
recode
The following object is masked from ‘package:purrr’:
some
[1] ‘3.0.2’
#libver(mclust)
#libver(chemometrics)
libver(purrrlyr)
[1] ‘0.0.3’
libver('qvalue')
[1] ‘2.10.0’
libver("breakaway")
[1] ‘4.6.11’
I have put the functions in a library file
source('libraries/library096.R')
# Set upOriginal to false, if you want to used user-reprocessed data.
# Results may differ slightly from those in the manuscript due to inter-run variation
# especially in the tree-ing algorithm.
upOriginal <- TRUE
# upOriginal <- FALSE
# For permutation tests, how fast do things need to run
# 9999 for most runs, 99999 for publication quality ones suggested
jnperm <- 9999
# Data paths
getwd()
[1] "/home/jacob/Projects/Nyvac_096_Microbiome"
(mapping_file_path <- file.path('data', 'mapping_file_096a.csv'))
[1] "data/mapping_file_096a.csv"
(immune_file_path <- file.path('data', 'immune096b.csv'))
[1] "data/immune096b.csv"
if(upOriginal){
seqtab_file_path <- file.path('data', 'seqtab.nochimNov2017.csv')
taxa_file_path <- file.path('data', 'TaxaNov2017.csv')
tree_path <- file.path('data', 'phylogeny096NovTree.tre')
} else {
seqtab_file_path <- file.path('data1', 'seqtab.nochimMar2018.csv')
taxa_file_path <- file.path('data1', 'TaxaMar2018.csv')
tree_path <- file.path('data1', 'phylogeny096Mar2018tre.tre')
}
seqtab_file_path
[1] "data/seqtab.nochimNov2017.csv"
taxa_file_path
[1] "data/TaxaNov2017.csv"
tree_path
[1] "data/phylogeny096NovTree.tre"
# Sequence data
seqtab.nochim.data <- read.csv(seqtab_file_path)
seqtabNames = gsub('\\.', '-',
gsub('.fastq', '', seqtab.nochim.data$X)
)
seqtab.nochim = as.matrix(seqtab.nochim.data[,-1])
rownames(seqtab.nochim) = seqtabNames
# Taxa names
taxa.data <- read.csv(taxa_file_path)
taxa = taxa.data[,-1]
## I reverse complemented the sequences to generate the taxonomy
# (but only in this latest re-run, not the original)
## The following undoes that reverse complement to get original sequence
#rownames(taxa) = dada2:::rc(taxa.data[,1])
if(upOriginal){
rownames(taxa) = (taxa.data[,1])} else {
rownames(taxa) = dada2:::rc(taxa.data[,1])
}
taxa <- as.matrix(taxa)
# Mapping file
mapping.data <- read_csv(mapping_file_path) %>%
mutate(pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
Parsed with column specification:
cols(
SampleID = col_character(),
BarcodeSequence = col_character(),
LinkerPrimerSequence = col_character(),
ReversePrimer = col_character(),
run_prefix = col_character(),
pub_id = col_character(),
Sex = col_character(),
Visit = col_integer(),
visitRank = col_integer(),
RXCode = col_character(),
Description = col_character()
)
The `printer` argument is deprecated as of rlang 0.3.0.
[90mThis warning is displayed once per session.[39m
#mapping = mapping.data[,-1]
#rownames(mapping) = mapping.data[,1]
#mapping <- as.matrix(mapping)
head(mapping.data)
# Immune Data
immune.data0 <- read_csv(immune_file_path)
Parsed with column specification:
cols(
visitno = col_integer(),
rx_code = col_character(),
type = col_character(),
antigen = col_character(),
mag = col_double(),
mag_bl = col_double(),
response = col_integer(),
day = col_integer(),
month = col_double(),
ct = col_character(),
response_j = col_double(),
assay = col_character(),
pub_id = col_character()
)
immune.data <- mutate(immune.data0, pub_id = sapply(pub_id, function(x) {as.numeric(gsub("096-", "", x))}))
levels(immune.data$antigen) <- gsub("[ /]", ".", levels(immune.data$antigen))
head(immune.data)
# Phylogenetic tree
seqs <- dada2::getSequences(seqtab.nochim)
names(seqs) <- seqs
pt <- ape::read.tree(file=tree_path)
pt2 <- phangorn::midpoint(pt)
immune.data$antigen %>% unique
[1] "gp41" "p24" "Con.6.gp120.B" "ZM96.gp140" "gp70_B.CaseA_V1_V2"
[6] "ANY.ENV.PTEG"
Save options to a variable
par0 <- options()
## minimal sample identification data
pub_id_key <- unique(immune.data[,c("pub_id", "rx_code", "ct")])
sample_sm0 <- dplyr::select(mapping.data, SampleID, pub_id, sex = Sex, muVisit = Visit, muVisitRank = visitRank)
sample_sm <- left_join(sample_sm0, pub_id_key, by = "pub_id") %>%
as.data.frame %>%
tibble::column_to_rownames(var = "SampleID")
`chr_along()` is deprecated as of rlang 0.2.0.
[90mThis warning is displayed once per session.[39mColumn `pub_id` has different attributes on LHS and RHS of join
# rownames(sample_sm)
# head(sample_sm)
# Make raw phyloseq object
ot <- otu_table(seqtab.nochim, taxa_are_rows=FALSE)
tt <- tax_table(taxa)
dimnames(tt) = dimnames(taxa)
spl <- sample_data(sample_sm)
psN is a really raw phyloseq object * OTU names are given as accession numbers * Numbers are in total counts * We have samples from both time points
# Quite raw phyloseq object. Species names are given as accession numbers
psN <- phyloseq(ot, tt, spl, pt2)
psN
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 47 samples ]
sample_data() Sample Data: [ 47 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
I want to make a phyloseq object for use in essentally all of the subsequent analyses. Features include: * Some basic taxonomic pre-processing. * No uncharacterized phyla. * Only OTUs that show up at least 10% of the time in the final data set . * Do this after filtering samples. * No tip-glomming. I’ll save that untill later. * Immune data is included in the sample data table. * We’ll do Andrew’s representitive IgGs and IgAs. * We only have samples from visit 1. * We only have samples from experemental (not control) groups.
immune.data %>% pull(type) %>% unique
[1] "IgA" "IgG" "CD4+"
#immune.data %>% unite(type_antigen ,type, antigen, sep = "_")
immune.data %>% dplyr::select(pub_id, month, type, antigen, mag) %>%
filter(month %in% c(0, 6.5, 12)) %>%
unite(type_antigen, type, antigen, sep = "_") %>%
unite(type_antigen_month,type_antigen, month, sep = "_Month_") %>%
spread(key = type_antigen_month, value = mag, drop = TRUE) -> immune.table
immune.table %>% head
Some investegation suggested by the phyloseq tutorials to identify phyla for removal, and to identify an abundance threshold
psN
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 47 samples ]
sample_data() Sample Data: [ 47 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
psN %>% subset_samples(!is.na(pub_id))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 960 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 960 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 960 tips and 959 internal nodes ]
# skip the blanks
psN %>% subset_samples(!is.na(pub_id)) %>%
# OTUs must be characterized to phylum
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) -> psN_hasPhylum
psN_hasPhylum
phyloseq-class experiment-level object
otu_table() OTU Table: [ 929 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 929 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 929 tips and 928 internal nodes ]
# from 960 to 929 otus
Identifying and removing phyla with very few taxa in them
prevdf = apply(X = otu_table(psN_hasPhylum),
MARGIN = ifelse(taxa_are_rows(psN_hasPhylum), yes = 1, no = 2),
FUN = function(x){sum(x > 0)})
# Add taxonomy and total read counts to this data.frame
prevdf = data.frame(Prevalence = prevdf,
TotalAbundance = taxa_sums(psN_hasPhylum),
tax_table(psN_hasPhylum))
plyr::ddply(prevdf, "Phylum", function(df1){cbind(mean(df1$Prevalence),sum(df1$Prevalence))})
filterPhyla = c("Verrucomicrobia", "Tenericutes", "Elusimicrobia", "Synergistetes")
psN_MainPhyla = subset_taxa(psN_hasPhylum, !Phylum %in% filterPhyla)
psN_MainPhyla
phyloseq-class experiment-level object
otu_table() OTU Table: [ 922 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 6 sample variables ]
tax_table() Taxonomy Table: [ 922 taxa by 8 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 922 tips and 921 internal nodes ]
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_log10() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
# Determining abundance threshold
prevdf1 = subset(prevdf, Phylum %in% get_taxa_unique(psN_MainPhyla, "Phylum"))
ggplot(prevdf1, aes(TotalAbundance, Prevalence / nsamples(psN_hasPhylum),color=Phylum)) +
# Include a guess for parameter
geom_hline(yintercept = 0.05, alpha = 0.5, linetype = 2) + geom_point(size = 2, alpha = 0.7) +
scale_x_continuous() + xlab("Total Abundance") + ylab("Prevalence [Frac. Samples]") +
facet_wrap(~Phylum) + theme(legend.position="none")
Make a phyloseq object like psN2 but with all participants, psN2A Code copied from above.
(phyloseq object of relative abundances)
And psN1 (phyloseq object of counts)
psN1A and psN2A include all participants, and will be used to look at variability within participants
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# only use data from humans (no extraction controls)
subset_samples(is.finite(muVisitRank)) %>%
# only otus from known taxa that show up frequently enough
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
# only otus that show up in at least 10% of samples
prevalence_filter_taxa %>%
# convert to relative abundance
tag_phyloseq%>%
# Instead of naming each taxon with its full sequence, we use the "tag" instead
swap.phyloseq.taxnames %>%
pass -> psN1A # Save pre relative abundance transformation
`list_len()` is deprecated as of rlang 0.2.0.
Please use `new_list()` instead.
[90mThis warning is displayed once per session.[39mSetting row names on a tibble is deprecated.
# add is-male
manColumn <- psN1A %>% sample_data %>% as('data.frame') %>% rownames_to_column %>% mutate(isMale = testIsMaleVec(sex)) %>% dplyr::select(rowname, isMale)
psN1A <- phylo_join(psN1A, manColumn, by = 'rowname')
## psN2 is like psN1 but with relative abundances
psN1A %>%
transform_sample_counts(function(x) {x/sum(x)}) %>%
# The "tag" is a new name that takes into account the rest of the taxonomy data
# the tag may need to be updated after any agglomeration
pass-> psN2A
# filter to just microbiome visit 1 and experemental treatments
psN1A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN1
psN2A %>%
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
pass -> psN2
# Calculate weighted unifrac distances and role those in.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
psN2.pcoa.df <- psN2.pcoa %>% scores(display = "sites") %>%
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2') %>%
mutate(rMDS1 = rank(MDS1)) %>% # rank order of MDS1
mutate(rrMDS1 = formatC(format = "d", rMDS1, flag = "0", width=ceiling(log10(max(rMDS1))))) %>%
unite(newname, rrMDS1, rowname, sep = "_", remove = FALSE) %>%
dplyr::select(-rrMDS1)
psN2 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN2
## Even if the data are counts,
## the weighted unifrac pcoa is still done on the relative abundances
psN1 %>%
phylo_join(
psN2.pcoa.df,
by = 'rowname'
) -> psN1
psN2A
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 31 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1A
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 38 samples ]
sample_data() Sample Data: [ 38 samples by 31 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN2
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
psN1
phyloseq-class experiment-level object
otu_table() OTU Table: [ 651 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 651 taxa by 10 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 651 tips and 650 internal nodes ]
How many taxa were still present after each filtering step?
# Find number of taxa in available samples
psN %>%
# add all the immune data
phylo_join(immune.table, by = "pub_id") %>%
# filter to just microbiome visit 1 and experemental treatments
subset_samples(muVisitRank == 1) %>%
subset_samples(ct == "T") %>%
prevalence_filter_taxa(thresh = 0) %>%
pass-> psInSamples
(NInSamples <- dim(otu_table(psInSamples))[2])
[1] 960
# Number of taxa with unidentified phyla
psInSamples %>%
subset_taxa(!is.na(Phylum)& !Phylum %in% c("", "uncharacterized")) %>%
pass -> psIdentifiedPhylum
(NUnkPhylum <- NInSamples - dim(otu_table(psIdentifiedPhylum))[2])
[1] 31
filterPhyla
[1] "Verrucomicrobia" "Tenericutes" "Elusimicrobia" "Synergistetes"
# Phyla removed because they are in filterPhyla
# -- each of which show up fewer than 20 times in the data set
psIdentifiedPhylum %>%
subset_taxa(!Phylum %in% filterPhyla) %>%
pass -> psNotPhylaFiltered
(NFiltPhyla <- dim(otu_table(psIdentifiedPhylum))[2] - dim(otu_table(psNotPhylaFiltered))[2])
[1] 7
# Taxa removed because there were in fewer than 10% of the samples
psNotPhylaFiltered %>%
prevalence_filter_taxa %>%
pass -> psPFT
dim(otu_table(psNotPhylaFiltered))[2] - dim(otu_table(psPFT))[2]
[1] 386
How to participants’ immune profiles change over time?
# When were participants vaccinated?
# Copied from protocol apendix E
# visitno 1 is a screening visit, I assign it NaN
dayTable = data.frame(
visitno = seq(from = 1, to = 14, by = 1),
day = c(NaN, 0, 14, 28, 42, 84, 98, 168, 182, 196, 273, 364, 455, 545),
month = c(NaN, 0, 0.5, 1, 1.5, 3, 3.5, 6, 6.5, 7, 9, 12, 15, 18)
)
vac <- data.frame(
visitno = c(2, 4, 6, 8)
)
vac <- left_join(vac, dayTable, by = 'visitno')
vac
# Representitive antigens for further considerations
# These are essentially zero (mag = 1) at baseline
ants1 <- c('Con.6.gp120.B', 'ZM96.gp140', 'gp70_B.CaseA_V1_V2')
# These have measurable baseline magnitudes
ants2 <- c('gp41', 'p24')
donor.immune <- psN2 %>% sample_data %>% as('data.frame') %>% dplyr::select(pub_id) %>%
left_join(immune.data, by = 'pub_id')
Column `pub_id` has different attributes on LHS and RHS of join
donor.immune %>% head
psN %>% sample_data %>%
as('data.frame') %>%
filter(!is.na(pub_id)) %>%
pull(pub_id) %>%
unique %>%
pass -> microbiomeCohort
immune.data %>% filter(pub_id %in% microbiomeCohort) %>%
pass -> donor.immune
donor.immune %>% head
options(par0)
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
ggsave('figures/useiggsAllParticipants.svg')
Saving 7 x 7 in image
# To fix. Control groups don't show up in this version.
iggplot
iggplot <- immune.data %>%
mutate(inCohort = pub_id %in% microbiomeCohort) %>%
filter(type %in% c('IgA', 'CD4+') & antigen %in% c(ants2, 'ANY.ENV.PTEG'))%>%
mutate(antigen = factor(antigen, levels = c(ants2, 'ANY.ENV.PTEG'))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id, colour = inCohort, alpha = inCohort)) +
geom_line() +
geom_point() +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen(), scales = 'free_y') +
theme_bw() +
theme(strip.text.y = element_text(angle = 0),
axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^c(
seq(from = -2, to = 0, by = 0.25), seq(from = 0, to = 5, by = 1)
), labels = function(x) round(as.numeric(x), digits=3)) +
scale_colour_manual(values = c("black", "red")) +
scale_alpha_manual(values = c(.6, 1)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useIgACD4AllParticipants.svg')
Saving 7.29 x 4.5 in image
# To fix. Control groups don't show up in this version.
iggplot <- donor.immune %>% filter(type == 'IgG', antigen %in% c(ants1, ants2)) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1))) %>% # reorder facets
ggplot(aes(x = month, y =mag, group = pub_id)) + geom_point(alpha = 0.6) + geom_line(alpha = 0.4) +
geom_rug(data = vac, aes(x = month), inherit.aes = F, color = 'blue') +
facet_grid(antigen ~ rx_code, labeller = label_wrap_gen()) +
theme_bw() + theme(strip.text.y = element_text(angle = 0), axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_log10(breaks = 10^(0:5)) +
labs(y = "BAMA Response Magnitude")
iggplot
ggsave('figures/useiggs.svg') # I can no longer save pngs with transparency, going to svg
Saving 7.29 x 4.5 in image
# To fix. Control groups don't show up in this version.
colnames(immune.data)
[1] "visitno" "rx_code" "type" "antigen" "mag" "mag_bl" "response" "day" "month"
[10] "ct" "response_j" "assay" "pub_id"
Plan. Calculate mean and variance of each antigen-type at visit 9, and at baseline.
#
geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}
geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id, antigen, visitno, mag, mag_bl, type) %>% group_by(antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl)
Mean variance and variance over mean of each value.
immune.data %>% dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), sd_mag = geosd(mag), sd_bl = geosd(mag_bl)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val")
As above, but this time, calculated seperately for each treatment group and then those values averaged. ZM96 and gp70 surprisingly large variance over mean. Maybe should look at delta magnitude
immune.data%>% head
immune.data %>%
mutate(mag_delta = mag / mag_bl) %>%
mutate(mag_delta = if_else(mag_delta < 1, 1, mag_delta)) %>%
dplyr::filter(visitno %in% c(9), ct == "T") %>% dplyr::select(pub_id,rx_code, antigen, visitno, mag, mag_bl, mag_delta, type) %>% group_by(rx_code, antigen, type) %>%
summarize(mean_mag = geomean(mag), mean_bl = geomean(mag_bl), mean_delta = geomean(mag_delta), sd_mag = geosd(mag), sd_bl = geosd(mag_bl), sd_delta = geosd(mag_delta)) %>%
mutate(var_over_mean_mag = sd_mag^2/mean_mag, var_over_mean_bl = sd_bl^2/mean_bl, var_over_mean_delta = sd_delta^2/mean_delta) %>%
gather(key = "meas", value = "value", mean_mag, mean_bl, sd_mag, sd_bl, var_over_mean_mag, var_over_mean_bl, mean_delta, sd_delta, var_over_mean_delta) %>%
group_by(antigen, type, meas) %>% summarize(mean_val = mean(value)) %>%
spread(key = "meas", value = "mean_val") %>%
arrange(type)
immune.data %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
donor.immune %>%
group_by(rx_code) %>%
summarize(Unique_ids = n_distinct(pub_id))
Number of participants with a given day of first sample.
psN2 %>% sample_data %>% data.frame %>% group_by(muVisit) %>% summarize(n = length(muVisit))
How many donors were there from each treatment?
psN2 %>% sample_data %>% data.frame %>% group_by(rx_code) %>% summarize(n = length(muVisit))
Breakdown by both visit and treatment
sampleBreakdown <- psN2 %>% sample_data %>% data.frame %>% group_by(muVisit, rx_code) %>% summarize(n = length(muVisit)) %>% spread(key = rx_code, value = n, fill = 0) %>% mutate(total = T1 + T2 + T3 + T4)
sbcs <- colSums(sampleBreakdown)
sampleBreakdown <- bind_rows(sampleBreakdown, sbcs)
sampleBreakdown %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
#kable_styling("striped", "hover", full_width = F) %>%
#collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass-> sampleBreakdown.html
sampleBreakdown.html
| muVisit | T1 | T2 | T3 | T4 | total |
|---|---|---|---|---|---|
| 2 | 3 | 0 | 0 | 0 | 3 |
| 9 | 4 | 2 | 5 | 0 | 11 |
| 12 | 0 | 1 | 1 | 5 | 7 |
| 23 | 7 | 3 | 6 | 5 | 21 |
sampleBreakdown.html %>% cat(file = 'tables/sampleBreakdown.html')
psN2A.wuf <- phyloseq::distance(psN2A, method = "wunifrac")
psN2A %>% sample_data %>% .[1:5, 1:10]
Sample Data: [5 samples by 10 sample variables]:
psN2A %>% sample_data %>% as("data.frame") %>% rownames_to_column(var = "Sample") %>% dplyr::select(Sample, pub_id) %>%
pass -> S2P
All.equal <- Vectorize(function(x,y){x == y})
## Convert distance matrix into long form matrix
psN2A.wuf %>% as.matrix %>% as.data.frame %>%
rownames_to_column(var = "SampleX")%>%
gather(key = "SampleY", value = "WufDist", -SampleX) %>%
left_join(S2P, by = c("SampleX" = "Sample")) %>% rename(pub_id_x = pub_id) %>%
left_join(S2P, by = c("SampleY" = "Sample")) %>% rename(pub_id_y = pub_id) %>%
mutate(SampleX = as.numeric(str_extract(SampleX, "[0-9][0-9]"))) %>%
mutate(SampleY = as.numeric(str_extract(SampleY, "[0-9][0-9]"))) %>%
## discard diagonal discard and upper half of triangular matrix
filter(SampleX < SampleY) %>%
mutate(isSamePerson = All.equal(pub_id_x, pub_id_y)) %>%
# ## discard cases where pub_id is unknown
# filter(is.finite(pub_id_x) & is.finite(pub_id_y))
pass -> AllWufDist
AllWufDist %>% head
#AllWufDist %>% ggplot(aes(x = isSamePerson, y = WufDist)) + geom_violin() + geom_dotplot(binaxis = "y", stackdir = "center", binwidth = .005)
Get Mean values for between participant and within participant weighted unifrac distances.
WufMeans <- AllWufDist %>% group_by(isSamePerson) %>% summarize(mean = mean(WufDist))
WufMeans
#SameAndDiff <- data.frame(comparison = c("different", "same"), WufMeans$mean)
SameAndDiff <- WufMeans %>% spread(key = isSamePerson, value = mean) %>% rename(between = 'FALSE', within = 'TRUE') %>% mutate(diff = between-within)
SameAndDiff
#SameAndDiff[1,2] - SameAndDiff[2,2] # Difference in weighted unifrac dissimilarity between same and different partcipants
Bootstrap some confidence intervals of within and between participant weighted unifrac distances.
# Split the data
SamePersonWufDist <- AllWufDist %>% filter(isSamePerson) #%>% dplyr::select(WufDist)
DifferentPersonWufDist <- AllWufDist %>% filter(!isSamePerson)
set.seed(334)
bootsSame <- rsample::bootstraps(SamePersonWufDist, times = 10000)
bootsDifferent <- rsample::bootstraps(DifferentPersonWufDist, times = 10000)
mean_of_bootstrap <- function(split){
locVals <- rsample::analysis(split)$WufDist
mean(locVals)
}
boot_meansSame <- bootsSame %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_meansDifferent <- bootsDifferent %>% mutate(mean = map_dbl(splits, mean_of_bootstrap)) %>% dplyr::select(mean)
boot_means <- bind_cols(within = boot_meansSame$mean, between = boot_meansDifferent$mean) %>%
mutate(isDifferentBigger = between>within,
DifMinusSame = between - within)
#boot_means
1- sum(boot_means$isDifferentBigger)/length(boot_means$isDifferentBigger)
[1] 0.0116
The above is a bootstrapped P value that the two are different from eachother. Per a conversation I had with Klaus Hubert. Still need to find a justification that this approach is legit.
boot_means %>% ggplot(aes(x = DifMinusSame)) + geom_histogram()
A histogram of bootstrapped differences between within participant and between participant mean values.
quantile(boot_means$DifMinusSame, c(0.025, 0.5, 0.975))
2.5% 50% 97.5%
0.01288475 0.08727948 0.15505984
95% confidence intervals of the differences between same and different person microbiota.
PermWufDist <- AllWufDist %>% modelr::permute(10000, WufDist)
mean_of_is_same <- function(df){df %>% as.data.frame %>% group_by(isSamePerson) %>% summarize(mean(WufDist)) %>% spread(isSamePerson, `mean(WufDist)`)}
test <- PermWufDist %>% pull(perm) %>% .[[1]] %>% as.data.frame
test %>% head
mean_of_is_same(test)
SameAndDiff
permutedMeans <- PermWufDist %>% mutate(meanvals = map(perm, mean_of_is_same)) %>% dplyr::select(meanvals) %>% unnest %>%
rename(between = `FALSE`, within = `TRUE`) %>%
mutate(diff = between - within, absdif = abs(diff)) %>%
mutate(isExtreme = absdif >= SameAndDiff$diff, isExtreme1Tail = diff >= SameAndDiff$diff)
#permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
permutedMeans %>% ggplot(aes(x = diff)) + geom_histogram() + geom_vline(xintercept = SameAndDiff$diff)
Fraction of permuted values less extreme than difference between same and different.
#permutedMeans %>% mutate(isExtreme = absdif >= SameAndDiff$diff)
sum(permutedMeans$isExtreme) / length(permutedMeans$isExtreme)
[1] 0.0153
sum(permutedMeans$isExtreme1Tail) / length(permutedMeans$isExtreme1Tail)
[1] 0.0073
Two and one tailed permuted p-values
options(repr.plot.width=6, repr.plot.height= 6)
boot_means %>% dplyr::select(within, between) %>% gather(key = "comparison", value = "WufDist") %>% ggplot(aes(x = comparison, y = WufDist)) +
geom_dotplot(data = AllWufDist %>% mutate(isSamePerson2 = if_else(isSamePerson, "within", "between"))
, aes(x = isSamePerson2, y = WufDist), binaxis = "y", stackdir = "center", binwidth = .01, colour = "gray40", fill = "white") +
geom_violin(fill = NA) +
geom_point(data = data.frame(comparison = c("within", "between"), WufDist = WufMeans$mean), aes(x = comparison, y = c(WufDist[2], WufDist[1])), shape = 22, fill = "black", size = 2)+
theme_bw() +
scale_x_discrete(limits = c("within", "between")) + labs(y = "Weighted Unifrac Distance") #
ggsave('figures/BetweenVsWithin.png')
Saving 7.29 x 4.5 in image
Figure: Open circles represent weighed unifrac distances associated with pairs of samples taken from the same set of participants, at different time points (“within”), and samples taken from different sets of participants (“between”). Black squares indicate the observed mean of the within and between values. Violins indicate distributions of bootstrapped mean values.
psN2.wuf <- phyloseq::distance(psN2, method = "wunifrac")
psN2.pcoa <- capscale(psN2.wuf ~ 1)
# How much variance si explained by each weighted unifrac axis
# Note, ten axes cover 95% of the variance.
# I'm not going to look beyond that for any test.
data.frame(eig = psN2.pcoa$CA$eig) %>%
rownames_to_column('axis') %>%
mutate(proportion = eig/sum(eig)) %>%
mutate(cumulative = cumsum(proportion))
my_breaks = c(1, 75, 250, 500, 1000,2000)
psN2 %>% mutate_phyloseq_sample(
mc41 = factor(medcode_hl(IgG_gp41_Month_0)),
log120 = (IgG_Con.6.gp120.B_Month_12)) -> psN2_mod
psN2_mod%>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = mc41), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp41 Baseline', direction = -1, discrete = TRUE) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp41
psN2_mod %>%
sample_data() %>%
ggplot(aes(x = MDS1, y = MDS2)) + geom_point(aes(fill = log120), size = 5, stroke = 1, shape = 21) +
coord_fixed(sqrt(psN2.pcoa$CA$eig[2]/psN2.pcoa$CA$eig[1])) +
viridis::scale_fill_viridis(name = 'gp120 Final', direction = 1, trans = "sqrt",
breaks = my_breaks, labels = my_breaks) +
#scale_colour_manual(name = 'gp41 Primary', values = c('black', 'grey70')) +
theme_bw() -> wuford_gp120
par <- options()
options(repr.plot.width=11, repr.plot.height= 4)
#g <- grid.arrange(wuford_gp41, wuford_gp120, ncol = 2)
g <- cowplot::plot_grid(wuford_gp41, wuford_gp120, ncol = 2, labels = c("A", "B"), label_size = 24)
g
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, width = 8, height = 4)
#ggsave('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, width = 8, height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.png', g, base_width = 8, base_height = 4)
cowplot::save_plot('figures/wunifrac_Agp41_Bgp120_pcoa.svg', g, base_width = 8, base_height = 4)
wufKN2 <- D2K(as.matrix(psN2.wuf))
muDoners <- unique(sample_data(psN2)$pub_id)
immune.data %>%
filter(pub_id %in% muDoners) %>%
filter(
(type == 'IgG' &
antigen %in% ants1 &
month %in% c(6.5,12)
) |
(type %in% c('IgG', 'IgA') &
antigen %in% ants2 &
month %in% c(0,6.5,12)
) |
type == 'CD4+' &
antigen == 'ANY.ENV.PTEG' &
month %in% c(6.5, 12)
)-> use.immune
head(use.immune)
# Do permanova and related tests to a variable of interest
# This function is pretty specific to this analysis, so I'm going to leave it
# here in the notebook file
CapVar <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
medWuf <- NA
rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
medWuf <- adonis(loc.wuf2 ~ transformation(ydata), permutations = nperm)
#medWuf$aov.tab[1,c('R2', 'Pr(>F)')]
## Capscale returns the same results as adonis (permanova), but also gives some other interesting results
medWufCap <- capscale(loc.wuf2 ~ transformation(ydata))
capanova <- anova(medWufCap, permutations = nperm)
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
left_join(
vegan::scores(medWufCap, display = 'sites') %>% as.data.frame %>% dplyr::select(CAP1) %>%
rownames_to_column, by = 'rowname') %>% .[!yna,]
# # Is giving only positive results with CAP1, not sure why
# glmAnova <- glm(medcode(ydata) ~ MDS1 + CAP1, data = samDf, family = 'binomial') %>% anova(test = "Chisq")
loc_glm <- glm(transformation(ydata) ~ MDS1, data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
#glmAnova['CAP1', 'Deviance']/out_capanova['NULL', 'Resid. Dev']
## check against mirkat
loc.Kwuf2 <- wufKN2[!yna, !yna]
mirkatP <- MiRKAT(y = transformation(ydata), Ks = loc.Kwuf2, out_type = "C", method = 'permutation', nperm = nperm)
#list(medWuf, capanova, mirkatP)
pred_pct <- predict(loc_glm, type = "response")
pred_01 <- as.numeric(predict(loc_glm, type = "response") > 0.5)
accuracy <- mean(transformation(ydata) == pred_01)
null_glm <- update(loc_glm, ~1)
# Canonical caluclation of McFadden's R2 for the GLM
McFadden = 1- (logLik(loc_glm)/ logLik(null_glm))
L.full = logLik(loc_glm)
D.full = -2 * L.full
L.base = logLik(null_glm)
D.base = -2 * L.base
n = dim(samDf)[1]
Nagelkerke = (1 - exp((D.full - D.base)/n))/(1 - exp(-D.base/n))
# A GLM of all weighted unifrac components
data.frame(
caps.P = capanova['Model', 'Pr(>F)'],
adonisP = medWuf$aov.tab[1, 'Pr(>F)'],
mir.P = mirkatP,
caps.F = capanova['Model', 'F'],
caps.R2 = medWufCap$CCA$tot.chi/medWufCap$tot.chi,
wuf1.P = glmAnova['MDS1', 'Pr(>Chi)'],
wuf1.DR = glmAnova['MDS1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev'],
wuf1.McFadden = Nagelkerke,
accuracy,
wuf1.coef = coef(loc_glm)[2]
#cap1.P = glmAnova['CAP1', 'Pr(>Chi)'],
#cap1.R2 = glmAnova['CAP1', 'Deviance'] / glmAnova['NULL', 'Resid. Dev']
)
}
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& month == 0 & ct == 'T') -> test.immune1
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
1.481 0.132 1.743
tps
use.immune %>%
filter(type == 'CD4+' & month == 6.5 & ct == 'T') -> test.immune.pteg
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- CapVar(test.immune.pteg, nperm = jnperm, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
1.482 0.015 1.533
tps
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm))) -> permKernTable
`env_bind_fns()` is deprecated as of rlang 0.3.0.
Please use `env_bind_active()` instead.
[90mThis warning is displayed once per session.[39m`new_overscope()` is deprecated as of rlang 0.2.0.
Please use `new_data_mask()` instead.
[90mThis warning is displayed once per session.[39m`overscope_eval_next()` is deprecated as of rlang 0.2.0.
Please use `eval_tidy()` with a data mask instead.
[90mThis warning is displayed once per session.[39m
|========== | 10% ~20 s remaining
|=============== | 15% ~21 s remaining
|==================== | 20% ~19 s remaining
|========================= | 25% ~18 s remaining
|============================== | 30% ~17 s remaining
|=================================== | 35% ~15 s remaining
|======================================== | 40% ~14 s remaining
|============================================= | 45% ~13 s remaining
|================================================== | 50% ~12 s remaining
|======================================================= | 55% ~11 s remaining
|============================================================ | 60% ~10 s remaining
|================================================================= | 65% ~8 s remaining
|====================================================================== | 70% ~7 s remaining
|=========================================================================== | 75% ~6 s remaining
|================================================================================ | 80% ~5 s remaining
|===================================================================================== | 85% ~4 s remaining
|========================================================================================== | 90% ~3 s remaining
|=============================================================================================== | 95% ~1 s remaining
|=====================================================================================================|100% ~0 s remaining
`overscope_clean()` is deprecated as of rlang 0.2.0.
[90mThis warning is displayed once per session.[39m
permKernTable
proc.time() - ptm
user system elapsed
24.591 0.135 24.940
The above function runs several extra tests. Results as follows:
type antigen visitno - things we run over
caps.P - Capscale test asks whether if we rotate things a bit and then try to use the best axis to compare to the data. Its similar to the wuf1.P value, but with some rotation
adonisP - p-value for a permanova test. Similar to mirkat p-value. One key exception is that igg_gp41_Month_0 falls on different sides of the 0.05 threshold.
mir.P is the p value for the kernel regression test, as run in the MiRKAT package. (Zhao et al., 2015)
caps.F and caps R2 are the f and r squared values for the capscale test.
wuf.P - is the p value of a glm comparing weighted unifrac component one against variables of interest. This test appears to always be statistically significantly positive when the mirkat test is positve.
wuf1.DR - one way of calculating an R2 value from a glm. We devide the deviance by the residual deviance
wuf1.McFadden - is a McFadden’s pseudo R^2. This turns out to be identical to the previous calculation.
accuracy - the fraction of the time that the glm predicts something falls above or below the median correctly. This turns out to not be super informative. Everything has around a 60% accuracy.
wuf1.coef - the coefficient of the glm model. The sign is relevant. Things with postive sign are associated with high values of weighted unifrac axis 1.
# Clean up so we just see the results of the kernel regression
concisePermKernTable <- permKernTable %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen,Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, GlmMDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass -> concisePermKernTable
write.csv(format(concisePermKernTable, digits = 3), 'tables/concisePermkernTable.csv')
# export conditionally formatted table as html
colNames1 = c(' ' = 3, 'Kernel' = 2, 'MDS' = 4)
colNames2 = c('Type', 'Antigen', 'Month', 'P', 'Q', 'P', 'Q', 'R2', 'Coef' )
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTable.html
concisePermKernTable.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.249 | 0.498 | 0.218 | 0.125 | 0.093 | -0.937 |
| 12 | 0.243 | 0.498 | 0.228 | 0.125 | 0.094 | -0.920 | ||
| IgA | gp41 | 0 | 0.956 | 0.956 | 0.651 | 0.219 | 0.013 | 0.333 |
| 6.5 | 0.209 | 0.498 | 0.143 | 0.109 | 0.129 | -1.133 | ||
| 12 | 0.744 | 0.930 | 0.855 | 0.259 | 0.002 | -0.136 | ||
| p24 | 0 | 0.896 | 0.952 | 0.320 | 0.161 | 0.061 | 0.752 | |
| 6.5 | 0.904 | 0.952 | 0.650 | 0.219 | 0.013 | -0.333 | ||
| 12 | 0.376 | 0.578 | 0.387 | 0.172 | 0.049 | -0.657 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.004 | 0.076 | 0.002 | 0.011 | 0.497 | -3.104 |
| 12 | 0.032 | 0.153 | 0.010 | 0.017 | 0.361 | -2.289 | ||
| gp41 | 0 | 0.046 | 0.153 | 0.014 | 0.017 | 0.331 | 2.185 | |
| 6.5 | 0.046 | 0.153 | 0.030 | 0.031 | 0.267 | -1.809 | ||
| 12 | 0.644 | 0.858 | 0.779 | 0.248 | 0.005 | -0.205 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.904 | 0.952 | 0.619 | 0.219 | 0.016 | -0.365 | |
| 12 | 0.034 | 0.153 | 0.014 | 0.017 | 0.332 | -2.135 | ||
| p24 | 0 | 0.214 | 0.498 | 0.444 | 0.179 | 0.037 | -0.567 | |
| 6.5 | 0.425 | 0.608 | 0.397 | 0.172 | 0.047 | -0.749 | ||
| 12 | 0.284 | 0.501 | 0.159 | 0.109 | 0.120 | -1.085 | ||
| ZM96.gp140 | 6.5 | 0.016 | 0.153 | 0.009 | 0.017 | 0.370 | -2.338 | |
| 12 | 0.300 | 0.501 | 0.162 | 0.109 | 0.118 | -1.076 |
concisePermKernTable.html %>% cat(file = 'tables/concisePermkernTable.html')
Latex version of the same table
docHead <- "\\documentclass[12pt]{article} % use larger type; default would be 10pt
\\usepackage[utf8]{inputenc} % set input encoding (not needed with XeLaTeX)
\\usepackage{booktabs}
\\usepackage{longtable}
\\usepackage{array}
\\usepackage{multirow}
\\usepackage[table]{xcolor}
\\usepackage{wrapfig}
\\usepackage{float}
\\usepackage{colortbl}
\\usepackage{pdflscape}
\\usepackage{tabu}
\\usepackage{threeparttable}
\\usepackage{threeparttablex}
\\usepackage[normalem]{ulem}
\\usepackage{makecell}
\\definecolor{green}{rgb}{1, 1, .9}
\\begin{document}
"
docTail <- "\\end{document}
"
# Make latex table
concisePermKernTable %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "latex",
bold = ifelse(MDS1_P < 0.05,
T,
F),
italic = ifelse(MDS1_P < 0.05 & MDS1_Coef < 0,
T,
F)),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTable.tex
# Print latex table to tex file
cat(docHead, concisePermKernTable.tex, docTail, file = 'tables/concisePermkernTable1.tex')
concisePermKernTable %>% filter(Kernel_P < 0.05) -> shortPermkernTable
shortPermkernTable
write.csv(format(shortPermkernTable, digits = 3), 'tables/shortPermkernTable.csv')
Of kernel regression P values
my_runif <- function(Len){
loc_runif <- runif(n = Len)
sort_loc_runif <- sort(loc_runif)
data.frame(case = 1:Len, relement = sort_loc_runif)
}
calc_bound <- function(df, bound){
quantile(df$relement, bound)
}
make_qqdata <- function(pvec, nboot = 10000){
locP <- pvec
LP <- length(locP)
#1:10 %>% map(runif, n = LP)
sortedP <- sort(locP)
exp2 <- 1:length(locP)/length(locP)
sortedExpP <- sort(exp2)
random_pvalues <- data.frame(iter = 1:nboot) %>%
mutate(rand = map(iter, ~my_runif(Len = LP))) %>% unnest %>%
nest(-case) %>%
mutate(lb = map_dbl(data, ~calc_bound(., 0.025)),
ub = map_dbl(data, ~calc_bound(.,0.975))) %>% dplyr::select (-data)
qqdata <- bind_cols(sortedP = sortedP, sortedExpP = sortedExpP, random_pvalues)
return(qqdata)
}
qqdata_permKernMir <- make_qqdata(permKernTable$mir.P)
qqdata_permKernMir %>% str
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 20 obs. of 5 variables:
$ sortedP : num 0.0038 0.0163 0.0324 0.0339 0.0457 ...
$ sortedExpP: num 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 ...
$ case : int 1 2 3 4 5 6 7 8 9 10 ...
$ lb : num 0.00128 0.01199 0.03215 0.05669 0.0869 ...
$ ub : num 0.166 0.253 0.317 0.378 0.437 ...
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMir.png')
Saving 7.29 x 4.5 in image
qqdata_permKernMir %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) + geom_line(aes(y = lb), colour = "grey40") + geom_line(aes(y = ub), colour = "grey40") + scale_y_log10() + scale_x_log10()
We see from the qqplot that we generally fall below the 1:1 line, suggesting that our P values are lower than we would expect from chance. To make the grey lines, I sampled 95% confidence intervals of random draws from the uniform distribution (essentially random p values). The data points sometimes, but not always fall below the lower one of these. However, we care about all of the p values, not wheter eaach individual is more unusual than we would expect by chance. I’ll replort just the regular qqplot as a supplemental figure.
ptm = proc.time()
tps <- CapVar(test.immune1, nperm = 9999, transformation = function(x){jac_box_cox(x)}, family = 'gaussian')
proc.time() - ptm
user system elapsed
1.521 0.072 1.649
tps
# Run above function against every relevant variable.
ptm <- proc.time()
use.immune %>%
filter(ct == 'T') %>%
group_by(type, antigen, month) %>%
do(data.frame(CapVar(., nperm = jnperm,
transformation = function(x){jac_box_cox(x)},
family = 'gaussian'))) -> permKernTableGaus
|========== | 10% ~29 s remaining
|=============== | 15% ~28 s remaining
|==================== | 20% ~26 s remaining
|========================= | 25% ~25 s remaining
|============================== | 30% ~24 s remaining
|=================================== | 35% ~23 s remaining
|======================================== | 40% ~22 s remaining
|============================================= | 45% ~20 s remaining
|================================================== | 50% ~19 s remaining
|======================================================= | 55% ~17 s remaining
|============================================================ | 60% ~2 m remaining
|================================================================= | 65% ~2 m remaining
|====================================================================== | 70% ~1 m remaining
|=========================================================================== | 75% ~1 m remaining
|================================================================================ | 80% ~46 s remaining
|===================================================================================== | 85% ~33 s remaining
|========================================================================================== | 90% ~21 s remaining
|=============================================================================================== | 95% ~10 s remaining
|=====================================================================================================|100% ~0 s remaining
permKernTableGaus
proc.time() - ptm
user system elapsed
347.120 9.861 187.660
# Clean up so we just see the results of the kernel regression
concisePermKernTableGaus <- permKernTableGaus %>% ungroup %>%
mutate(Kernel_Q = p2q(mir.P), MDS1_Q = p2q(wuf1.P)) %>%
dplyr::select(Type = type, Antigen = antigen, Month = month, Kernel_P = mir.P, Kernel_Q,
MDS1_P = wuf1.P, MDS1_Q, MDS1_R2 = wuf1.McFadden, MDS1_Coef = wuf1.coef) %>%
as.data.frame %>%
pass
concisePermKernTableGaus
write.csv(format(concisePermKernTableGaus, digits = 3), 'tables/concisePermkernTableGaus.csv')
# export conditionally formatted table as html
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "html",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', '')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "html",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'lightyellow', '')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "html",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', '')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "html",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'lightyellow', '')
),
Month = cell_spec(Month, "html")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("html", escape = F, digits = 3, align = 'c', col.names = colNames2) %>%
kable_styling("striped", "hover", full_width = F) %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") -> concisePermKernTableGaus.html
concisePermKernTableGaus.html
| Type | Antigen | Month | P | Q | P | Q | R2 | Coef |
|---|---|---|---|---|---|---|---|---|
| CD4+ | Any ENV PTEG | 6.5 | 0.610 | 0.813 | 0.597 | 0.344 | 0.016 | -1.90e-01 |
| 12 | 0.183 | 0.488 | 0.329 | 0.248 | 0.054 | -3.39e-01 | ||
| IgA | gp41 | 0 | 0.989 | 0.989 | 0.633 | 0.344 | 0.012 | 3.86e-01 |
| 6.5 | 0.407 | 0.626 | 0.601 | 0.344 | 0.015 | -3.22e-01 | ||
| 12 | 0.652 | 0.814 | 0.504 | 0.329 | 0.025 | 4.63e-01 | ||
| p24 | 0 | 0.951 | 0.989 | 0.433 | 0.303 | -0.001 | 1.05e-02 | |
| 6.5 | 0.977 | 0.989 | 0.857 | 0.441 | 0.002 | -1.02e-01 | ||
| 12 | 0.577 | 0.813 | 0.309 | 0.248 | 0.057 | -4.16e-01 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.075 | 0.374 | 0.032 | 0.093 | 0.196 | -7.18e+03 |
| 12 | 0.001 | 0.018 | 0.000 | 0.000 | 0.498 | -3.17e+01 | ||
| gp41 | 0 | 0.220 | 0.488 | 0.038 | 0.093 | 0.192 | 9.17e-01 | |
| 6.5 | 0.212 | 0.488 | 0.080 | 0.102 | 0.144 | -8.10e-01 | ||
| 12 | 0.257 | 0.505 | 0.173 | 0.169 | 0.089 | -8.66e+00 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.278 | 0.505 | 0.206 | 0.183 | 0.082 | -4.70e-01 | |
| 12 | 0.108 | 0.434 | 0.083 | 0.102 | 0.138 | -1.49e+00 | ||
| p24 | 0 | 0.757 | 0.890 | 0.907 | 0.444 | 0.001 | 8.41e-02 | |
| 6.5 | 0.025 | 0.197 | 0.053 | 0.102 | 0.172 | -9.57e+39 | ||
| 12 | 0.407 | 0.626 | 0.133 | 0.145 | 0.106 | -6.03e+03 | ||
| ZM96.gp140 | 6.5 | 0.030 | 0.197 | 0.017 | 0.083 | 0.244 | -8.54e-01 | |
| 12 | 0.199 | 0.488 | 0.078 | 0.102 | 0.142 | -1.49e+00 |
concisePermKernTableGaus.html %>% cat(file = 'tables/concisePermkernTableGaus.html')
concisePermKernTableGaus %>% filter(Kernel_P < 0.05) -> shortPermkernTableGaus
shortPermkernTableGaus
write.csv(format(shortPermkernTableGaus, digits = 3), 'tables/shortPermkernTable.csv')
# Make latex table
concisePermKernTableGaus %>%
mutate(
# this row needs to happen first, since the reformatting of the nother numbers makes them harder to call
MDS1_Coef = cell_spec(format(MDS1_Coef, digits = 3), "html",
bold = ifelse(Kernel_P < 0.05,
T,
F),
italic = ifelse(Kernel_P < 0.05 & MDS1_Coef < 0,
T,
F),
background = ifelse(Kernel_P < 0.05, ifelse(MDS1_Coef < 0, "lightsalmon", "lightblue"), "")
),
Kernel_P = cell_spec(format_round(Kernel_P, 3), "latex",
bold = ifelse(Kernel_P < 0.05, T, F),
background = ifelse(Kernel_P < 0.05, 'yellow', 'white')
),
Kernel_Q = cell_spec(format_round(Kernel_Q, 3), "latex",
bold = ifelse(Kernel_Q < 0.2, T, F),
background = ifelse(Kernel_Q < 0.2, 'green', 'white')
),
MDS1_P = cell_spec(format_round(MDS1_P, 3), "latex",
bold = ifelse(MDS1_P < 0.05, T, F),
background = ifelse(MDS1_P < 0.05, 'yellow', 'white')
),
MDS1_Q = cell_spec(format_round(MDS1_Q, 3), "latex",
bold = ifelse(MDS1_Q < 0.2, T, F),
background = ifelse(MDS1_Q < 0.2, 'green', 'white')
),
#Month = cell_spec(format_round(Month,0), "html")
Month = cell_spec(Month, "latex")
) %>%
mutate(Antigen = gsub('ANY.ENV.PTEG', 'Any ENV PTEG', Antigen)) %>%
mutate(Antigen = gsub('gp70_B.CaseA_V1_V2', 'gp70 B.CaseA V1-V2', Antigen)) -> toTable
toTable %>%
kable("latex", escape = F, digits = 3, align = 'c', col.names = colNames2, booktabs = T) %>%
kable_styling(position = "left") %>%
add_header_above(colNames1) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> concisePermKernTableGaus.tex
# Print latex table to tex file
cat(docHead, concisePermKernTableGaus.tex, docTail, file = 'tables/concisePermkernTableGaus.tex')
### QQplot for kernel regression data
qqdata_permKernMirGaus <- make_qqdata(permKernTableGaus$mir.P)
options(repr.plot.width=6, repr.plot.height= 6)
qqdata_permKernMirGaus %>% ggplot(aes(x = sortedExpP, y = sortedP)) + geom_point(shape = 1) + geom_abline(intercept = 0, slope = 1) +
labs(x = expression(paste("Expected ", italic("p"),"-value")), y = expression(paste("Observed ", italic("p"),"-value")))
ggsave('figures/qq_permKernMirGaus.png')
Saving 7.29 x 4.5 in image
use.immune %>% dplyr::select(pub_id, visitno, type, antigen, mag) -> tmp
full_join(tmp, tmp, by = 'pub_id') %>%
group_by(visitno.x, type.x, antigen.x, visitno.y, type.y, antigen.y) %>%
nest %>%
mutate(x2 = map(data, function(df){unwarn(chisq.test(df$mag.x, df$mag.y))})) %>%
mutate(glance = map(x2, glance)) %>%
dplyr::select(-data, -x2) %>%
unnest(glance) %>%
#mutate(q.value = p2q(p.value)) %>% # reurns NaNs
pass -> compareImmuneX2
compareImmuneX2 %>% filter(
type.x == 'IgG' &
antigen.x == 'gp41' &
type.y == 'IgG' &
antigen.y == 'gp41'
)
compareImmuneX2 %>%
filter(type.x == 'IgG' & type.y == 'IgG' & antigen.x != antigen.y) %>%
write_csv('tables/chisq_IgG_comparasons.csv')
psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname') %>%
sample_data %>% as('data.frame') %>% rownames_to_column -> hereSam
data_frame(formula = paste("transformation(ydata) ~ MDS", 1:10, sep = ""))
EachMDS <- function(x, nperm = 9999, transformation = medcode2, family = 'binomial'){
## Pull out the needed data
psN2.wMDS <- psN2 %>% phylo_join(scores(psN2.pcoa, display = "sites", choices = 1:10) %>%
as.data.frame %>% rownames_to_column, by = 'rowname')
# medWuf <- NA
# rankWuf <- NA
locPS <- phylo_join(psN2.wMDS, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
#loc.wuf <- wufKN2
#loc.jsd <- jsdKN2
ydata <- ydata0
ydata <- ydata0[!yna]
loc.wuf2 <- psN2.wuf %>% as.matrix %>% .[!yna, !yna]
samDf <- locPS %>% sample_data %>% as('data.frame') %>% rownames_to_column %>%
.[!yna,]
# # Is giving only positive results with CAP1, not sure why
loc_glm <- glm(as.formula("transformation(ydata) ~ MDS1"), data = samDf, family = family)
glmAnova <- loc_glm %>% anova(test = "Chisq")
# data_frame, rather than data.frame
# https://stackoverflow.com/questions/48450308/iterating-over-formulas-in-purrr#48450308
data_frame(formulaString = paste("transformation(ydata) ~ MDS", 1:10, sep = "")) %>%
mutate(model = map(formulaString, function(fs){
glm(as.formula(fs), data = samDf, family = family)})) %>%
mutate(anova = map(model, anova)) %>%
mutate(glance = map(model, glance)) %>%
mutate(tidy = map(model, tidy)) %>%
mutate(coef = map(model, ~ coef(summary(.))[2,])) %>%
pass -> allmodels
allmodels %>% dplyr::select("tidy") %>% unnest %>% filter(term != '(Intercept)')
}
# Just confirming that the function works before it goes in a giant loop. I'd delete this,
# but i'll just end up needing it again if I do.
ptm = proc.time()
tps <- EachMDS(test.immune.pteg, nperm = 9999, transformation = medcode, family = 'binomial')
proc.time() - ptm
user system elapsed
0.480 0.014 0.516
tps
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(coefs = map(data, ~ EachMDS(.))) %>%
dplyr::select(-data) %>% unnest(coefs) -> glmMDScoefs
ants1
[1] "Con.6.gp120.B" "ZM96.gp140" "gp70_B.CaseA_V1_V2"
glmMDScoefs %>%
gather(key = "key", value = "value", estimate:p.value) %>%
filter(key == "p.value") %>%
spread(key = term, value = value) %>%
dplyr::select(-key, -MDS10, MDS10) %>%
dplyr::rename(Type = type, Antigen = antigen, Month = month) %>%
mutate(Type = factor(Type, levels = c( "IgA", "IgG", "CD4+"))) %>%
mutate(Antigen = factor(Antigen, levels = c(ants1, ants2, "ANY.ENV.PTEG"))) %>%
#Clean up labels
mutate(Antigen = stringr::str_replace_all(Antigen, "_", " ")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "V1 V2", "V1-V2")) %>%
mutate(Antigen = stringr::str_replace_all(Antigen, "ANY.ENV.PTEG", "Any ENV PTEG")) %>%
arrange(Type) -> allMDS
allMDS
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
as.character() -> allMDS.html
allMDS %>%
kable("html", escape = F, digits = 3, align = 'c') %>%
collapse_rows(columns = 1:2, latex_hline = "full")
| Type | Antigen | Month | MDS1 | MDS2 | MDS3 | MDS4 | MDS5 | MDS6 | MDS7 | MDS8 | MDS9 | MDS10 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IgA | gp41 | 0.0 | 0.653 | 0.607 | 0.401 | 0.702 | 0.786 | 0.650 | 0.242 | 0.349 | 0.038 | 0.700 |
| 6.5 | 0.168 | 0.368 | 0.701 | 0.954 | 0.112 | 0.142 | 0.456 | 0.475 | 0.789 | 0.864 | ||
| 12.0 | 0.855 | 0.201 | 0.462 | 0.408 | 0.994 | 0.798 | 0.469 | 0.106 | 0.079 | 0.232 | ||
| p24 | 0.0 | 0.335 | 0.629 | 0.520 | 0.939 | 0.666 | 0.657 | 0.148 | 0.853 | 0.921 | 0.517 | |
| 6.5 | 0.652 | 0.281 | 0.681 | 0.414 | 0.438 | 0.599 | 0.999 | 0.199 | 0.881 | 0.351 | ||
| 12.0 | 0.398 | 0.724 | 0.227 | 0.202 | 0.128 | 0.465 | 0.445 | 0.156 | 0.516 | 0.159 | ||
| IgG | Con.6.gp120.B | 6.5 | 0.017 | 0.536 | 0.420 | 0.890 | 0.325 | 0.427 | 0.853 | 0.806 | 0.634 | 0.250 |
| 12.0 | 0.031 | 0.878 | 0.377 | 0.635 | 0.989 | 0.540 | 0.569 | 0.374 | 0.686 | 0.299 | ||
| gp41 | 0.0 | 0.040 | 0.894 | 0.304 | 0.967 | 0.442 | 0.231 | 0.504 | 0.595 | 0.856 | 0.509 | |
| 6.5 | 0.057 | 0.274 | 0.226 | 0.891 | 0.858 | 0.165 | 0.601 | 0.426 | 0.212 | 0.606 | ||
| 12.0 | 0.779 | 0.310 | 0.443 | 0.228 | 0.680 | 0.268 | 0.995 | 0.039 | 0.357 | 0.054 | ||
| gp70 B.CaseA V1-V2 | 6.5 | 0.621 | 0.853 | 0.300 | 0.458 | 0.507 | 0.690 | 0.506 | 0.161 | 0.039 | 0.880 | |
| 12.0 | 0.037 | 0.665 | 0.808 | 0.447 | 0.318 | 0.296 | 0.743 | 0.143 | 0.403 | 0.276 | ||
| p24 | 0.0 | 0.451 | 0.231 | 0.986 | 0.088 | 0.149 | 0.051 | 0.882 | 0.507 | 0.384 | 0.439 | |
| 6.5 | 0.404 | 0.828 | 0.243 | 0.250 | 0.689 | 0.082 | 0.180 | 0.846 | 0.360 | 0.101 | ||
| 12.0 | 0.182 | 0.721 | 0.237 | 0.676 | 0.933 | 0.371 | 0.042 | 0.423 | 0.716 | 0.147 | ||
| ZM96.gp140 | 6.5 | 0.030 | 0.631 | 0.244 | 0.750 | 0.234 | 0.949 | 0.890 | 0.994 | 0.090 | 0.504 | |
| 12.0 | 0.186 | 0.353 | 0.286 | 0.722 | 0.889 | 0.409 | 0.459 | 0.376 | 0.021 | 0.189 | ||
| CD4+ | Any ENV PTEG | 6.5 | 0.237 | 0.555 | 0.771 | 0.108 | 0.237 | 0.104 | 0.717 | 0.084 | 0.627 | 0.319 |
| 12.0 | 0.248 | 0.565 | 0.434 | 0.103 | 0.117 | 0.856 | 0.861 | 0.128 | 0.773 | 0.723 |
allMDS.html
[1] "<table>\n <thead>\n <tr>\n <th style=\"text-align:center;\"> Type </th>\n <th style=\"text-align:center;\"> Antigen </th>\n <th style=\"text-align:center;\"> Month </th>\n <th style=\"text-align:center;\"> MDS1 </th>\n <th style=\"text-align:center;\"> MDS2 </th>\n <th style=\"text-align:center;\"> MDS3 </th>\n <th style=\"text-align:center;\"> MDS4 </th>\n <th style=\"text-align:center;\"> MDS5 </th>\n <th style=\"text-align:center;\"> MDS6 </th>\n <th style=\"text-align:center;\"> MDS7 </th>\n <th style=\"text-align:center;\"> MDS8 </th>\n <th style=\"text-align:center;\"> MDS9 </th>\n <th style=\"text-align:center;\"> MDS10 </th>\n </tr>\n </thead>\n<tbody>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"6\"> IgA </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.653 </td>\n <td style=\"text-align:center;\"> 0.607 </td>\n <td style=\"text-align:center;\"> 0.401 </td>\n <td style=\"text-align:center;\"> 0.702 </td>\n <td style=\"text-align:center;\"> 0.786 </td>\n <td style=\"text-align:center;\"> 0.650 </td>\n <td style=\"text-align:center;\"> 0.242 </td>\n <td style=\"text-align:center;\"> 0.349 </td>\n <td style=\"text-align:center;\"> 0.038 </td>\n <td style=\"text-align:center;\"> 0.700 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.168 </td>\n <td style=\"text-align:center;\"> 0.368 </td>\n <td style=\"text-align:center;\"> 0.701 </td>\n <td style=\"text-align:center;\"> 0.954 </td>\n <td style=\"text-align:center;\"> 0.112 </td>\n <td style=\"text-align:center;\"> 0.142 </td>\n <td style=\"text-align:center;\"> 0.456 </td>\n <td style=\"text-align:center;\"> 0.475 </td>\n <td style=\"text-align:center;\"> 0.789 </td>\n <td style=\"text-align:center;\"> 0.864 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.855 </td>\n <td style=\"text-align:center;\"> 0.201 </td>\n <td style=\"text-align:center;\"> 0.462 </td>\n <td style=\"text-align:center;\"> 0.408 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.798 </td>\n <td style=\"text-align:center;\"> 0.469 </td>\n <td style=\"text-align:center;\"> 0.106 </td>\n <td style=\"text-align:center;\"> 0.079 </td>\n <td style=\"text-align:center;\"> 0.232 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.335 </td>\n <td style=\"text-align:center;\"> 0.629 </td>\n <td style=\"text-align:center;\"> 0.520 </td>\n <td style=\"text-align:center;\"> 0.939 </td>\n <td style=\"text-align:center;\"> 0.666 </td>\n <td style=\"text-align:center;\"> 0.657 </td>\n <td style=\"text-align:center;\"> 0.148 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.921 </td>\n <td style=\"text-align:center;\"> 0.517 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.652 </td>\n <td style=\"text-align:center;\"> 0.281 </td>\n <td style=\"text-align:center;\"> 0.681 </td>\n <td style=\"text-align:center;\"> 0.414 </td>\n <td style=\"text-align:center;\"> 0.438 </td>\n <td style=\"text-align:center;\"> 0.599 </td>\n <td style=\"text-align:center;\"> 0.999 </td>\n <td style=\"text-align:center;\"> 0.199 </td>\n <td style=\"text-align:center;\"> 0.881 </td>\n <td style=\"text-align:center;\"> 0.351 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.398 </td>\n <td style=\"text-align:center;\"> 0.724 </td>\n <td style=\"text-align:center;\"> 0.227 </td>\n <td style=\"text-align:center;\"> 0.202 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.465 </td>\n <td style=\"text-align:center;\"> 0.445 </td>\n <td style=\"text-align:center;\"> 0.156 </td>\n <td style=\"text-align:center;\"> 0.516 </td>\n <td style=\"text-align:center;\"> 0.159 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"12\"> IgG </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Con.6.gp120.B </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.017 </td>\n <td style=\"text-align:center;\"> 0.536 </td>\n <td style=\"text-align:center;\"> 0.420 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.325 </td>\n <td style=\"text-align:center;\"> 0.427 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.806 </td>\n <td style=\"text-align:center;\"> 0.634 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.031 </td>\n <td style=\"text-align:center;\"> 0.878 </td>\n <td style=\"text-align:center;\"> 0.377 </td>\n <td style=\"text-align:center;\"> 0.635 </td>\n <td style=\"text-align:center;\"> 0.989 </td>\n <td style=\"text-align:center;\"> 0.540 </td>\n <td style=\"text-align:center;\"> 0.569 </td>\n <td style=\"text-align:center;\"> 0.374 </td>\n <td style=\"text-align:center;\"> 0.686 </td>\n <td style=\"text-align:center;\"> 0.299 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> gp41 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.040 </td>\n <td style=\"text-align:center;\"> 0.894 </td>\n <td style=\"text-align:center;\"> 0.304 </td>\n <td style=\"text-align:center;\"> 0.967 </td>\n <td style=\"text-align:center;\"> 0.442 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n <td style=\"text-align:center;\"> 0.595 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.509 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.057 </td>\n <td style=\"text-align:center;\"> 0.274 </td>\n <td style=\"text-align:center;\"> 0.226 </td>\n <td style=\"text-align:center;\"> 0.891 </td>\n <td style=\"text-align:center;\"> 0.858 </td>\n <td style=\"text-align:center;\"> 0.165 </td>\n <td style=\"text-align:center;\"> 0.601 </td>\n <td style=\"text-align:center;\"> 0.426 </td>\n <td style=\"text-align:center;\"> 0.212 </td>\n <td style=\"text-align:center;\"> 0.606 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.779 </td>\n <td style=\"text-align:center;\"> 0.310 </td>\n <td style=\"text-align:center;\"> 0.443 </td>\n <td style=\"text-align:center;\"> 0.228 </td>\n <td style=\"text-align:center;\"> 0.680 </td>\n <td style=\"text-align:center;\"> 0.268 </td>\n <td style=\"text-align:center;\"> 0.995 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.357 </td>\n <td style=\"text-align:center;\"> 0.054 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> gp70 B.CaseA V1-V2 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.621 </td>\n <td style=\"text-align:center;\"> 0.853 </td>\n <td style=\"text-align:center;\"> 0.300 </td>\n <td style=\"text-align:center;\"> 0.458 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.690 </td>\n <td style=\"text-align:center;\"> 0.506 </td>\n <td style=\"text-align:center;\"> 0.161 </td>\n <td style=\"text-align:center;\"> 0.039 </td>\n <td style=\"text-align:center;\"> 0.880 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.037 </td>\n <td style=\"text-align:center;\"> 0.665 </td>\n <td style=\"text-align:center;\"> 0.808 </td>\n <td style=\"text-align:center;\"> 0.447 </td>\n <td style=\"text-align:center;\"> 0.318 </td>\n <td style=\"text-align:center;\"> 0.296 </td>\n <td style=\"text-align:center;\"> 0.743 </td>\n <td style=\"text-align:center;\"> 0.143 </td>\n <td style=\"text-align:center;\"> 0.403 </td>\n <td style=\"text-align:center;\"> 0.276 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"3\"> p24 </td>\n <td style=\"text-align:center;\"> 0.0 </td>\n <td style=\"text-align:center;\"> 0.451 </td>\n <td style=\"text-align:center;\"> 0.231 </td>\n <td style=\"text-align:center;\"> 0.986 </td>\n <td style=\"text-align:center;\"> 0.088 </td>\n <td style=\"text-align:center;\"> 0.149 </td>\n <td style=\"text-align:center;\"> 0.051 </td>\n <td style=\"text-align:center;\"> 0.882 </td>\n <td style=\"text-align:center;\"> 0.507 </td>\n <td style=\"text-align:center;\"> 0.384 </td>\n <td style=\"text-align:center;\"> 0.439 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.404 </td>\n <td style=\"text-align:center;\"> 0.828 </td>\n <td style=\"text-align:center;\"> 0.243 </td>\n <td style=\"text-align:center;\"> 0.250 </td>\n <td style=\"text-align:center;\"> 0.689 </td>\n <td style=\"text-align:center;\"> 0.082 </td>\n <td style=\"text-align:center;\"> 0.180 </td>\n <td style=\"text-align:center;\"> 0.846 </td>\n <td style=\"text-align:center;\"> 0.360 </td>\n <td style=\"text-align:center;\"> 0.101 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.182 </td>\n <td style=\"text-align:center;\"> 0.721 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.676 </td>\n <td style=\"text-align:center;\"> 0.933 </td>\n <td style=\"text-align:center;\"> 0.371 </td>\n <td style=\"text-align:center;\"> 0.042 </td>\n <td style=\"text-align:center;\"> 0.423 </td>\n <td style=\"text-align:center;\"> 0.716 </td>\n <td style=\"text-align:center;\"> 0.147 </td>\n </tr>\n <tr>\n \n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> ZM96.gp140 </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.030 </td>\n <td style=\"text-align:center;\"> 0.631 </td>\n <td style=\"text-align:center;\"> 0.244 </td>\n <td style=\"text-align:center;\"> 0.750 </td>\n <td style=\"text-align:center;\"> 0.234 </td>\n <td style=\"text-align:center;\"> 0.949 </td>\n <td style=\"text-align:center;\"> 0.890 </td>\n <td style=\"text-align:center;\"> 0.994 </td>\n <td style=\"text-align:center;\"> 0.090 </td>\n <td style=\"text-align:center;\"> 0.504 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.186 </td>\n <td style=\"text-align:center;\"> 0.353 </td>\n <td style=\"text-align:center;\"> 0.286 </td>\n <td style=\"text-align:center;\"> 0.722 </td>\n <td style=\"text-align:center;\"> 0.889 </td>\n <td style=\"text-align:center;\"> 0.409 </td>\n <td style=\"text-align:center;\"> 0.459 </td>\n <td style=\"text-align:center;\"> 0.376 </td>\n <td style=\"text-align:center;\"> 0.021 </td>\n <td style=\"text-align:center;\"> 0.189 </td>\n </tr>\n <tr>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> CD4+ </td>\n <td style=\"text-align:center;vertical-align: middle !important;\" rowspan=\"2\"> Any ENV PTEG </td>\n <td style=\"text-align:center;\"> 6.5 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.555 </td>\n <td style=\"text-align:center;\"> 0.771 </td>\n <td style=\"text-align:center;\"> 0.108 </td>\n <td style=\"text-align:center;\"> 0.237 </td>\n <td style=\"text-align:center;\"> 0.104 </td>\n <td style=\"text-align:center;\"> 0.717 </td>\n <td style=\"text-align:center;\"> 0.084 </td>\n <td style=\"text-align:center;\"> 0.627 </td>\n <td style=\"text-align:center;\"> 0.319 </td>\n </tr>\n <tr>\n \n \n <td style=\"text-align:center;\"> 12.0 </td>\n <td style=\"text-align:center;\"> 0.248 </td>\n <td style=\"text-align:center;\"> 0.565 </td>\n <td style=\"text-align:center;\"> 0.434 </td>\n <td style=\"text-align:center;\"> 0.103 </td>\n <td style=\"text-align:center;\"> 0.117 </td>\n <td style=\"text-align:center;\"> 0.856 </td>\n <td style=\"text-align:center;\"> 0.861 </td>\n <td style=\"text-align:center;\"> 0.128 </td>\n <td style=\"text-align:center;\"> 0.773 </td>\n <td style=\"text-align:center;\"> 0.723 </td>\n </tr>\n</tbody>\n</table>"
allMDS.html %>% cat(file = 'tables/allMDS.html')
allMDS %>%
kable("latex", escape = F, digits = 3, align = 'c', booktabs = T) %>%
collapse_rows(columns = 1:2, latex_hline = "full") %>%
pass -> allMDS.latex
allMDS.latex %>% cat(file = 'tables/allMDS.tex')
# Print latex table to tex file
cat(docHead, allMDS.latex, docTail, file = 'tables/allMDS.tex')
write_csv(allMDS, 'tables/allMDSGlmPValues.csv')
at each taxonomic level
# How many taxa do we see if we agglomerate at different levels
psN2 %>% tax_table %>% as.data.frame %>% dplyr::select(Phylum:Genus) %>% colnames -> taxLevels
data_frame(taxLevels) %>%
mutate(ntaxa = map(taxLevels,
function(lev){
psN2 %>% tax_glom(lev) %>% ntaxa
}
)) %>%
mutate(ntaxa = unlist(ntaxa)) %>%
pass -> NTaxaAtLevel
NTaxaAtLevel
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN2), ps = list((psN2))) -> specRow
data_frame(taxLevels = "Species", ntaxa = ntaxa(psN1), psCount = list((psN1))) -> specRowC
D2K_savename <- function(distmat){
# cascade names forward with the D2K operation
require(MiRKAT)
out <- MiRKAT::D2K(distmat)
colnames(out) <- colnames(distmat)
rownames(out) <- rownames(distmat)
out
}
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
NTaxaAtLevel %>%
mutate(ps = map(ntaxa, ~tip_glom_saveid(psN2, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(ps = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRow) %>%
# calculate jensen-shannon distance matrix
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
Loading required package: cluster
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) -> tmp
tmp %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf0 # Original way
# Data frame of phyloseq objects distances and kernels at a bunch of taxonomic levels
# I use psN1 because I need count data for some downstream steps.
NTaxaAtLevel %>%
mutate(psCount = map(ntaxa, ~tip_glom_saveid(psN1, k = .))) %>%
# process the phyloseq objects so they have better names
mutate(psCount = map(psCount, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.)), oldname = 'oldname2'))) %>%
# add in the species data row (which should already have correct names)
bind_rows(specRowC) %>%
pass -> tmp
Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.Setting row names on a tibble is deprecated.
tmp %>%
# calculate jensen-shannon distance matrix
mutate(ps = map(psCount, ~transform_sample_counts(., function(x) {x/sum(x)}))) %>%
mutate(jsd = map(ps, ~phyloseq::distance(., method = "jsd") )) %>%
# convert to 2d matrix
mutate(jsdMat = map(jsd, ~as.matrix(.))) %>%
# calculate kernel
mutate(kjsd = map(jsdMat, ~D2K_savename(.))) -> tmp
tmp %>%
mutate(psNoZero = map(ps, ~transform_sample_counts(., function(x) x+(1/1000)))) %>%
## chemometrics::clr just works, while compositions::clr throws a criptic error message here
mutate(clr = map(psNoZero, ~ transform_otu_table(., chemometrics::clr))) %>%
#mutate(clr = map(psNoZero, ~ transform_otu_table(., function(x) as.matrix(compositions::clr(x))))) %>%
pass -> psDf
print(psDf)
MirMulti <- function(x, KsDf = psDf, ps = psN2, nperm = 9999){
Ks = KsDf$kjsd
# I bind to the phyloseq object and then peel off again later to guerentee
# that the y-data is in the same order as the Ks
locPS <- phylo_join(ps, x, by = 'pub_id')
ydata0 <- sample_data(locPS)$mag
yna <- is.na(ydata0)
ydata <- ydata0[!yna]
loc.Ks <- lapply(Ks, function(K){K[!yna, !yna]})
bcxJSD <- MiRKAT(y = jac_box_cox(ydata), Ks = loc.Ks, out_type = "C", method = 'permutation', nperm = nperm)
medJSD <- MiRKAT(y = medcode(ydata), Ks = loc.Ks, out_type = "D", method = 'permutation', nperm = nperm)
mmDf = data.frame(
taxLevels = KsDf$taxLevels,
ntaxa = KsDf$ntaxa,
bcxJSD = bcxJSD$indivP, medJSD = medJSD$indivP,
bcxJSDOmni = bcxJSD$omnibus_p, medJSDOmni = medJSD$omnibus_p)
mmDf
}
# Test case
use.immune %>%
filter(type == 'IgG' & antigen == 'gp41'& visitno == 2 & ct == 'T') -> test.immune1
test.mm <- MirMulti(test.immune1, Ks = psDf, nperm = 999)
test.mm
ptm = proc.time()
use.immune %>%
group_by(type, antigen, month) %>%
nest %>%
mutate(mir = map(data,
~MirMulti(., Ks = psDf, ps = psN2, nperm = 999)
)) %>%
dplyr::select(-data) %>% unnest(mir) %>%
pass -> mirLevels
proc.time() - ptm
user system elapsed
5.885 0.014 5.972
mirLevels %>% dplyr::select(-ntaxa, -medJSD) %>% spread(key = taxLevels, value = bcxJSD)
mirLevels %>% dplyr::select(-ntaxa, -bcxJSD) %>% spread(key = taxLevels, value = medJSD)
I’d like to combine the above into one table, when it isn’t 7:45. Probably has soemething to do with merging columns or something. Or maybe I just want to plot it as a figure.
mirLevels %>%
gather(metric, P, bcxJSD:medJSD) -> mirDat
mirDat %>% dplyr::select(type:month, bcxJSD = bcxJSDOmni, medJSD = medJSDOmni) %>%
group_by(type, antigen, month) %>%
summarize(bcxJSD = mean(bcxJSD), medJSD = mean(medJSD)) %>%
gather(metric, P, bcxJSD, medJSD) -> mirOmni
mirOmni
NTaxaAtLevel %>% bind_rows(specRow[,1:2]) %>% unite(nLev, taxLevels, ntaxa, remove = FALSE) -> NTaxaAtLevel2
NTaxaAtLevel2
bind_rows(
permKernTable %>% mutate(metric = 'med'),
permKernTableGaus %>% mutate(metric = 'bcx')
) %>%
dplyr::select(type, antigen, month, metric, mir.P) %>%
pass -> WufPData
fixant <- function(df){
df %>%
mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
#mutate(metric = stringr::str_replace_all(metric, "bcx", "")) %>%
pass
}
fixstuff <- function(df){
df %>%
fixant %>%
mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
pass
}
mirDat %>% fixant %>% head
mirDat %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
fixstuff %>%
ggplot(aes(x = ntaxa, y = P, col = factor(month), fill = factor(month))) +
geom_point(pch = 21) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
#geom_hline(data = mirOmni, aes(yintercept = P, col = factor(month))) +
#annotation_logticks(sides = 'bl') +
#geom_rug(data = mirOmni, aes(y = P, col = factor(month)), inherit.aes = F) +
geom_point(data = mirOmni %>% ungroup %>% fixstuff,
aes(x = 3, y = P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 22, size = 2) +
geom_point(data = WufPData %>% ungroup %>% fixant,
aes(x = 1000, y = mir.P, col = factor(month), fill = factor(month)), inherit.aes = F, pch = 24, size = 2) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd0
options(repr.plot.width=8, repr.plot.height= 6)
pjsd0
options(par0)
# I'd like to add weighted unifrac as a tick mark on the right.
NTaxaAtLevel2
# New combined data frame that has omnibus, regular, and wunifrac all in one
bind_rows(
mirDat %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(test = "JSD"),
mirOmni %>% ungroup %>% mutate(metric = stringr::str_replace_all(metric, "JSD", "")) %>%
mutate(taxLevels = "Omnibus") %>% mutate(test = "Omnibus"),
WufPData %>% ungroup %>% dplyr::rename(P = mir.P) %>%
mutate(taxLevels = "WUnifrac") %>% mutate(test = "WUnifrac")
) %>%
mutate(antigen = factor(antigen, levels = c(ants2, ants1, "ANY.ENV.PTEG"))) %>%
mutate(type = factor(type, levels = c("IgA", "IgG", "CD4+"))) %>%
mutate(taxLevels = factor(taxLevels, levels = c("Omnibus", NTaxaAtLevel2$taxLevels, "WUnifrac"))) %>%
dplyr::select(-c(bcxJSDOmni:medJSDOmni))%>%
unite(nLev, taxLevels, ntaxa, remove = FALSE) %>%
mutate(nLev = stringr::str_replace(nLev, "_NA", "")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "\\.", " ")) %>%
#mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "\\.", " "))) %>%
mutate(antigen = factor(antigen, labels = stringr::str_replace_all(levels(antigen), "_", " "))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
# mutate(antigen = factor(antigen, labels = (levels(antigen)))) %>%
mutate(test = factor(test, levels = c('Omnibus', 'JSD', 'WUnifrac'))) %>%
pass -> mirDat2
binding factor and character vector, coercing into character vectorbinding character and factor vector, coercing into character vector
mirDat2 %>% filter(type == 'CD4+')
mirDat2 %>% head
mirDat2 %>%
filter(type == "IgG") %>%
ggplot(aes(x = taxLevels, y = P, col = factor(month), fill = factor(month), shape = test)) +
geom_point(size = 2) +
facet_grid(type + antigen ~ metric, labeller = labeller(antigen = label_wrap_gen(width = 10))) +
#scale_x_log10(breaks = c(3, NTaxaAtLevel2$ntaxa, 1000), labels = c("omni", NTaxaAtLevel2$nLev, "wunifrac")) +
scale_y_log10(breaks = c(0.002, 0.01, 0.05, 0.2, 1)) +
geom_hline(yintercept=0.05, col = 'blue', alpha = 0.5) + geom_hline(yintercept=0.01, col = 'red', alpha = 0.5) +
scale_shape_manual(values = c(22, 21, 24)) +
scale_colour_manual(values=cbPalette) +
scale_fill_manual(values=alpha(cbPalette, 0.5)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) -> pjsd
options(repr.plot.width=6, repr.plot.height= 8)
pjsd
ggsave('figures/KernelPVsLevel.png', width = 6, height = 8)
options(par0)
X-axis is now spaced evenly
Table SX. P values of kernel regression tests. Circles indicate jensen shannon values at different taxonomic resolutions. Squares are the omnibus p-value for that cohort of tests. Triangles indicate kernel regression p-values for the corresponding weighted unifrac test.
The blue and red lines indicate p values of 5% and 1% respectively.
Observations: The weighted unifrac test is sensitive. In cases where only one taxonic level hits, weighted unifrac often also falls at some statistically significant value. The omnibus p value is often higher than the weighted unifrac one. Weighted unifrac seems like a good test for identifying patterns at any level that relate to an outcome. The jensen shannon informs us about which level the pattern is observed.
I might even be able to drill down to every level.
model_each_species <- function(ps, f, pthresh = 1, q = FALSE){
# Start with the otu table
ps %>%
# reshape it so we have clr values for every taxon-sample pair
otu_table %>% as.data.frame %>% rownames_to_column("Sample") %>% gather(Taxon, clr, -Sample) %>%
# bind that to the sample data
# doing this here seems remarkably inefficient, but its not creating a bottleneck so I'll leave it.
left_join(
ps %>%
# the sample data need to have MDS1 and MDS2 appended to them
phylo_join(
psN2.pcoa %>% scores(display = "sites") %>% # hardcoded psN2.pcoa
as.data.frame %>%
rownames_to_column %>%
dplyr::select('rowname', 'MDS1', 'MDS2'),
by = 'rowname'
) %>%
# back to binding to sample data
sample_data %>% as('data.frame') %>% rownames_to_column("Sample"),
by = 'Sample') %>%
group_by(Taxon) %>% # group and nest for model run
nest %>%
mutate(Mod = map(data, f)) %>% # apply model over each species
mutate(Glance = map(Mod, glance), Tidy = map(Mod, tidy)) %>% # extract relevant data from model
# view model
dplyr::select(Taxon, Tidy) %>% unnest %>%
mutate(term = gsub('[\\( \\)]','', term)) %>% # remove parentheses from "(Intercept)"
gather(meas, val, estimate:p.value) %>%
unite(meas, term, meas) %>% spread(meas, val) %>% arrange(clr_estimate) %>%
dplyr::select(Taxon, Intercept_estimate, clr_estimate, clr_std.error, clr_p.value) %>%
# add q value
{if(q) mutate(., clr_q.value = p2q(clr_p.value)) else .} %>%
filter(clr_p.value < pthresh) %>%
#Join taxonomy information
left_join(
as.data.frame(ps@tax_table@.Data) %>% as.tibble %>% dplyr::select(Kingdom:Genus, Species, tag) %>%
mutate(tag = as.character(tag)), # mutate so tag is and taxon are both character class
by = c("Taxon" = "tag")) %>%
pass
}
model_each_species_for_antigen <- function(antigen, ps = psN2){
ps %>%
model_each_species(function(df){glm(medcode2(get(antigen)) ~ clr, data = df, family = 'binomial')}, q = TRUE, pthresh = 1)
}
ColsToRun <- c('IgG_Con.6.gp120.B_Month_6.5', 'IgG_Con.6.gp120.B_Month_12', 'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_gp70_B.CaseA_V1_V2_Month_12', 'IgG_ZM96.gp140_Month_6.5', 'MDS1', 'isMale' )
model_each_species_case <- function(ps){
ps %>% model_each_species(function(df){glm(MDS1 ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = Taxon[order(clr_estimate)])) %>%
mutate(test = 'gaussian', antigen = 'MDS1') %>%
pass -> loc_mds1Glms
ps %>% model_each_species(function(df){glm(log10(IgG_Con.6.gp120.B_Month_12 + 100) ~ clr, data = df, family = 'gaussian')}, q = TRUE, pthresh = 1) %>%
arrange(clr_estimate) %>%
mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'gaussian', antigen = 'Con.6.gp120.B_Month_12') %>%
pass -> loc_gp120Glms
tibble(antigen = ColsToRun) %>% mutate(model = map(antigen, ~model_each_species_for_antigen(., ps = ps))) %>%
unnest %>% mutate(Taxon = factor(Taxon, levels = levels(loc_mds1Glms$Taxon))) %>%
mutate(test = 'binomial') %>%
pass-> loc_logitCoefs
#list(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs)
bind_rows(loc_mds1Glms, loc_gp120Glms, loc_logitCoefs) %>% dplyr::select(test, antigen, everything())
}
#psDf %>% mutate(ps2 = map(ps, ~swap.phyloseq.taxnames(tag_phyloseq(remove_tag_phyloseq(.))))) -> test
#psDf[[1,"clr"]] %>% tax_table
psDf[[1]]
[1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
psDf %>% print
ptm = proc.time()
psDf %>% dplyr::select(taxLevels, ntaxa, clr) %>% mutate(localmod = map(clr, model_each_species_case)) ->psDfLoc
glm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurredglm.fit: fitted probabilities numerically 0 or 1 occurred
proc.time() - ptm
user system elapsed
90.556 0.286 91.214
print(psDfLoc)
psDfLoc %>% dplyr::select(-clr) %>% unnest(localmod) -> tmp
Unequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorUnequal factor levels: coercing to characterbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vectorbinding character and factor vector, coercing into character vector
psDfLoc$taxLevels
[1] "Phylum" "Class" "Order" "Family" "Genus" "Species"
tmp %>% mutate(taxLevels = factor(taxLevels, levels = psDfLoc$taxLevels)) -> LocalTests
LocalTests %>%
filter(antigen != "MDS1") %>%
write_csv("tables/AllLocalTests.csv")
LocalTests %>% head
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_q.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalQEveryLevel.png')
Saving 7.29 x 4.5 in image
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) +# scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel.png')
Saving 7.29 x 4.5 in image
options(repr.plot.width=6, repr.plot.height=8)
LocalTests %>%
filter(antigen != "MDS1") %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " IgG ", "")) %>%
ggplot(aes(x = factor(taxLevels), y = clr_p.value)) + geom_point(size = 0.1) +
geom_hline(yintercept = 0.2, color = 'grey') +
geom_hline(yintercept = 0.05, color = 'blue') + geom_hline(yintercept = 0.01, color = 'green') +
facet_wrap(~antigen + test) + scale_y_log10() +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
ggsave('figures/LocalPEveryLevel_LogScale.png')
Saving 7.29 x 4.5 in image
order_taxa_by_mds1 <- function(df){
# this has to be a model_each_species type of data frame
df %>% filter(antigen == 'MDS1' & test == 'gaussian') %>%
mutate(TaxonF = factor(Taxon, levels = Taxon[order(clr_estimate)])) -> mds1df
df %>% mutate(TaxonF = factor(Taxon, levels = levels(mds1df$TaxonF)))
}
To my annoyance, everything is labeled with IgG except gp120_12
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen,
levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5',
'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12'))) %>%
filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
dplyr::select(antigen:clr_estimate) %>%
dplyr::select(-Intercept_estimate) %>%
mutate(cordir = sign(clr_estimate)) %>%
pass
LocalTests %>% pull(antigen) %>% unique
[1] "MDS1" "Con.6.gp120.B_Month_12" "IgG_Con.6.gp120.B_Month_6.5"
[4] "IgG_Con.6.gp120.B_Month_12" "IgG_gp41_Month_0" "IgG_gp41_Month_6.5"
[7] "IgG_gp70_B.CaseA_V1_V2_Month_12" "IgG_ZM96.gp140_Month_6.5" "isMale"
# Family Hits
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c(
'IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'
))) %>%
pass -> tmp
#tmp$antigen %>% unique
tmp %>% filter(clr_p.value < 0.05 & clr_q.value < 0.2) %>%
pull(Taxon) %>% unique -> useFamily
tmp %>% filter(Taxon %in% useFamily) %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(aes(x = TaxonF, y = clr_estimate,
color = (clr_p.value < 0.05), shape =(clr_q.value < 0.2))) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = clr_estimate - 2*clr_std.error, ymax = clr_estimate + 2*clr_std.error)) +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_hline(yintercept = 0) +
facet_wrap(~antigen, ncol = 1, scales = 'free_y') + xlab("Family Level Taxon")
# Show the censored ones accross - so this would be everything with at least one hit
# but also show what they are in all cases.
ggsave('figures/anyFamilyIgg.png', width = 6, height = 8)
I think its worth digging into clostridia and Prophyromonidaceae with stacked bars
Family level
Lets come back to this after we’ve done the local tests. Since we need them to color code the axes.
psDf %>% print
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(ps) %>% pull %>%.[[1]]
phyloseq-class experiment-level object
otu_table() OTU Table: [ 39 taxa and 21 samples ]
sample_data() Sample Data: [ 21 samples by 35 sample variables ]
tax_table() Taxonomy Table: [ 39 taxa by 12 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 39 tips and 38 internal nodes ]
print(psDf)
#nFamilyTaxa <- NTaxaAtLevel %>% filter(taxLevels == 'Family') %>% pull(ntaxa)
psDf %>% filter(taxLevels == 'Family') %>% dplyr::select(psNoZero) %>% pull %>%.[[1]] %>%
otu_table %>% as.data.frame %>%
pass -> myRel
ptm = proc.time()
phiBoot <- boot(data = myRel, statistic = boot_phi, R = 1000)
proc.time() - ptm
user system elapsed
12.083 0.140 12.275
ptm = proc.time()
tidyCI <- unwarn(
tidy(phiBoot,conf.int=TRUE,conf.method="bca")
)
proc.time() - ptm
user system elapsed
187.449 5.115 25.266
myRel %>% make_proportionality_matrix %>%
as.data.frame %>%
rownames_to_column("TaxonX") %>% gather(TaxonY, phi, -TaxonX) %>%
filter(TaxonX != TaxonY) %>% data.frame(tidyCI) -> namedTidyCI
head(LocalTests)
LocalTests %>% filter(test == 'gaussian' &
antigen == 'MDS1' &
clr_p.value <0.05 &
clr_q.value < 0.2&
taxLevels == "Family") -> tmp
tmp %>% pull(Taxon) -> MDS1Fam
tmp %>% filter(clr_estimate < 0) %>% pull(Taxon) -> lowMDS1Fam
tmp %>% filter(clr_estimate >= 0) %>% pull(Taxon) -> highMDS1Fam
I’d like to do this, but for gp41 baseline and gp120 as well.
useFamily
[1] "Bacteroidetes.4" "Firmicutes.4" "Firmicutes.2" "Clostridia" "Bacteroides"
[6] "Firmicutes.5" "Porphyromonadaceae.1" "Porphyromonadaceae" "Bacteroidetes.1" "Anaerococcus"
LocalTests %>% head
reshape2::melt
function (data, ..., na.rm = FALSE, value.name = "value")
{
UseMethod("melt", data)
}
<bytecode: 0x564f3a1688f8>
<environment: namespace:reshape2>
targStat <- "phi"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
p_phi_1 <- ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" ) +
# rectangles around the three clusters, positioned by eye
geom_rect(aes(xmin = 0 + 0.5, xmax = 10 - 0.5, ymin = 0 + 0.5, ymax = 10 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 13 + 0.5, xmax = 24 - 0.5, ymin = 13 + 0.5, ymax = 24 - 0.5),
fill = "transparent", color = "gray20", size = 1.5) +
geom_rect(aes(xmin = 23 + 0.5, xmax = 37 - 0.5, ymin = 23 + 0.5, ymax = 37 - 0.5),
fill = "transparent", color = "gray20", size = 1.5)
p_phi_1
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
LocalTests %>%
filter(taxLevels == 'Family') %>%
order_taxa_by_mds1 %>%
filter(
(antigen %in% c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))|
(antigen == 'Con.6.gp120.B_Month_12' & test == 'gaussian')
) %>%
mutate(antigen = factor(antigen, levels = c('IgG_gp41_Month_0', 'IgG_gp41_Month_6.5', 'IgG_Con.6.gp120.B_Month_6.5', 'Con.6.gp120.B_Month_12',
'IgG_ZM96.gp140_Month_6.5','IgG_gp70_B.CaseA_V1_V2_Month_12'))) %>%
pass -> tmp
tmp %>% dplyr::select(antigen, Taxon, clr_estimate, clr_p.value, clr_q.value) %>%
mutate(clr_sign = sign(clr_estimate)) %>%
mutate(isHit = ifelse(clr_p.value < 0.05 & clr_q.value < 0.2, 1, 0)) %>%
mutate(Taxon = factor(Taxon, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass -> chorddata
targStat <- "conf.low"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_low
p_phi_low
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
targStat <- "conf.high"
namedTidyCI %>% dplyr::select(TaxonX, TaxonY, targStat) %>% spread(key = TaxonY, value = targStat) %>%
remove_rownames %>% column_to_rownames("TaxonX") %>%
as.dist %>% as.matrix -> phidata
phi_dd <- as.dist(phidata)
phi_hc <- hclust(phi_dd)
phidata %>%
#.[phi_hc$order, phi_hc$order] %>% # this way also worked just fine
reshape2::melt() %>%
mutate(Var1 = factor(Var1, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
mutate(Var2 = factor(Var2, levels = unique(phi_hc$labels)[phi_hc$order])) %>%
pass-> tmp
ggplot(tmp, aes(Var1, Var2, fill =(value))) +
geom_tile() +
scale_fill_gradient(high = "grey90", low = "red",
space = "Lab",
name="phi",
limits = c(NA, 3), na.value = "white") +
# theme_minimal()+ # minimal theme
theme(axis.text.x = element_text(
angle = 90, vjust = 1, size = 7, hjust = 1,
face = ifelse(levels(tmp$Var1) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% useFamily, "black", "grey30")
),
axis.text.y = element_text(
size = 7,
face = ifelse(levels(tmp$Var1) %in% MDS1Fam, "bold", "plain"),
colour = ifelse(levels(tmp$Var1) %in% lowMDS1Fam, "red",
ifelse(levels(tmp$Var1) %in% highMDS1Fam, "blue", "grey30"))
))+
coord_fixed() +
labs(x = "Family Any Igg",y = "Family MDS1 (red-low, blue-high)" )-> p_phi_high
p_phi_high
# ggsave("figures/phi_vs_mds1_and_igg.png", p_phi_1, width = 6, height = 6)
chorddata %>%
#Clean up labels
mutate(antigen = stringr::str_replace_all(antigen, "_", " ")) %>%
mutate(antigen = stringr::str_replace_all(antigen, " Month", " -- Month")) %>%
mutate(antigen = stringr::str_replace_all(antigen, "IgG ", "")) %>%
ggplot(
aes(x = Taxon, y = antigen, alpha = factor(isHit), color = factor(clr_sign))) +
scale_alpha_discrete(range = c(0, 1)) +
guides(alpha = FALSE) +
theme_minimal() +
coord_fixed(ratio = 2) +
scale_colour_manual(values = c("red", "blue")) +
theme(axis.text.x = element_text(
angle = 90, vjust = 0.5, size = 7, hjust = 1,
face = ifelse(levels(chorddata$Taxon) %in% useFamily, "bold", "plain"),
colour = ifelse(levels(chorddata$Taxon) %in% useFamily, "black", "grey30")),
plot.margin = unit(c(0,3,1,3), "cm")
) +
#guides(col = TRUE) +
guides(color=guide_legend(title="Sign of GLM")) +
labs(x = "Family",y = "Antigen -- Month" ) +
geom_point() -> guitar_chords
Using alpha for a discrete variable is not advised.
par <- options()
options(repr.plot.width=10, repr.plot.height= 5)
guitar_chords
options(par)
p_phi_1a <- p_phi_1 +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
plot.margin = unit(c(1, 3, -5.5, 4), "cm"))
par <- options()
options(repr.plot.width=8, repr.plot.height= 8)
p_phi_cord <- cowplot::plot_grid(p_phi_1a, guitar_chords, nrow = 2, align = "v")
p_phi_cord
#phi_legend <- cowplot::get_legend(p_phi_1)
# cowplot::ggdraw(
# cowplot::plot_grid(
# cowplot::plot_grid(p_phi_1a, guitar_chords, ncol = 1, align = "v"),
# cowplot::plot_grid(phi_legend, NULL, ncol = 1),
# rel_widths = c(10,1)
# ))
ggsave('figures/phi_heatmap_withlegend.png', width = 10, height = 10)
options(par)
# More color-blind friendly colorbalettes
#http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=10
cb10 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a')
cb12 <- c('#a6cee3','#1f78b4','#b2df8a','#33a02c','#fb9a99','#e31a1c','#fdbf6f','#ff7f00','#cab2d6','#6a3d9a','#ffff99','#b15928')
# Less color-blind friendly, but still nice.
#https://sashat.me/2017/01/11/list-of-20-simple-distinct-colors/
trub20 <- c('#e6194b','#3cb44b','#ffe119','#0082c8','#f58231','#911eb4','#46f0f0','#f032e6','#d2f53c','#fabebe','#008080','#e6beff','#aa6e28','#fffac8','#800000','#aaffc3','#808000','#ffd8b1','#000080','#808080','#FFFFFF','#000000')
options(repr.plot.width=8, repr.plot.height= 4)
Bookmark Here. Stuck for strange reasons.
ordered_pub_id_df <- psN2 %>% sample_data %>% dplyr::select(pub_id, rMDS1, newname, MDS1) %>% arrange(rMDS1) %>% mutate(MDS1 = round(MDS1, 2), fig3tick = paste(pub_id, MDS1,sep = "_"))
Setting class(x) to multiple strings ("tbl_df", "tbl", ...); result will no longer be an S4 object
ordered_pub_id_df
fig3tick <- ordered_pub_id_df %>% pull(fig3tick)
p_phy <- plot_bar(psN2, x = 'newname', fill = 'Phylum') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("All Phyla")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_phy
#ggsave('plots/Phyla_by_wuf1.png')
p_firm <- subset_taxa(psN2, Phylum == 'Firmicutes') %>%
plot_bar( x = 'newname', fill = 'Order') +
theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) +
scale_fill_manual(values = cb10) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_firm
#ggsave('plots/MostFirmicutesAreClostridiales.png')
p_clostridia <- subset_taxa(psN2, Class == 'Clostridia') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Order Clostridiales (All Class Clostridia)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_clostridia
# p_porph <- subset_taxa(psN2, Family == 'Porphyromonadaceae') %>%
# plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) #+ theme_bw()
# p_porph
# p_bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>% # all class (Bacteroidia), order (Bacteroidales)
# plot_bar(x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) #+ theme_bw()
# p
# ggsave('figures/Bacteroidetes_Families.png')
p_ClosXI <- subset_taxa(psN2, Family == 'Clostridiales_Incertae_Sedis_XI') %>%
plot_bar( x = 'newname', fill = 'Genus') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Genera of Family Clostridiales_Incertae_Sedis_XI")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_ClosXI
#ggsave('figures/Clostridiales_Incertae_Sedis_XI_Genus.png')
p_Bact <- subset_taxa(psN2, Phylum == 'Bacteroidetes') %>%
plot_bar( x = 'newname', fill = 'Family') + scale_fill_manual(values = cb10) + xlab("") +
ggtitle("Families of Class Bacteroidetes (All Order Bacteroidiales)")+ theme_bw() +
theme(axis.text.x = element_text(angle = 90, hjust = 1,, vjust = 0.5, size = 10),
strip.text.y = (element_text(angle = 90))) + scale_x_discrete(labels = fig3tick) + labs(x = "pub_id _ MDS1", y = "Relative Abundance")
p_Bact
#ggsave('figures/Bacteroides.png')
options(repr.plot.width=14, repr.plot.height= 8)
sb <- cowplot::plot_grid(p_phy, p_Bact,p_clostridia,p_ClosXI,ncol = 2, labels = c("A", "B", "C", "D"))
sb
cowplot::save_plot('figures/stacked_bars.png', sb, base_width = 14, base_height = 8)
cowplot::save_plot('figures/stacked_bars.svg', sb, base_width = 14, base_height = 8)
# psDf %>%
# mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
# mutate(Tax = map(ps, ~data.frame(tax_table(.)))) %>%
# mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
# pass -> psDf1
psDf %>%
mutate(OTU = map(ps, ~data.frame(otu_table(.)))) %>%
mutate(Tax = map(ps, ~as.data.frame(.@tax_table@.Data))) %>%
mutate(OTUCount = map (psCount, ~data.frame(otu_table(.)))) %>%
pass -> psDf1
psDf1 %>%
.[1:5,] %>%
mutate(Tax = map(Tax, ~dplyr::select(.,-oldname2))) %>%
print
# https://stackoverflow.com/questions/50341012/return-the-mapped-object-if-expression-inside-of-purrrpossibly-fails/50341205#50341205
rm_oldname2 <- function(x){
f = possibly(function() dplyr::select(x, -oldname2), otherwise = x)
f()
}
psDf1 %>%
#.[1:5,] %>%
mutate(Tax = map(Tax, rm_oldname2)) %>%
pass -> psDf1b
print(psDf1)
# Show which species level OTUs are contained in each agglomerated group:
psDf1b %>%
.[1:5,] %>%
mutate(TaxIdx = map(Tax, function(df){
df %>%
mutate(tag = as.character(tag), oldGroups = as.character(oldGroups)) %>%
dplyr::select(tag, oldGroups) %>%
mutate(oldGroups = strsplit(oldGroups, ",")) %>%
unnest(oldGroups)
})) %>%
dplyr::select(taxLevels, TaxIdx) %>%
unnest(TaxIdx) %>%
mutate(oldGroups = trimws(oldGroups)) %>% # Some of these have leading or trailing whitespace
spread(taxLevels, tag) %>%
dplyr::select(oldGroups, Phylum, Class, Order, Family, Genus) %>%
pass -> taxGroupMapping
write_csv(taxGroupMapping, 'tables/taxGroupMapping.csv')
# Print out each otu table (relative abundances).
walk2(psDf1b$taxLevels, psDf1b$OTU,
~write.csv(.y, file = paste0("tables/OTU/otu_",.x, ".csv")))
# Print out each otu table (counts).
walk2(psDf1b$taxLevels, psDf1b$OTUCount,
~write.csv(.y, file = paste0("tables/OTU/otuCount_",.x, ".csv")))
# Print out each taxonomy table.
walk2(psDf1b$taxLevels, psDf1b$Tax,
~write_csv(.y, path = paste0("tables/Tax/tax_",.x, ".csv")))
save.image(file = "workspace.Rdata")
'package:stats' may not be available when loading
Add richness estemate to psN1. Jump that over to psN2
# fc <- psN1 %>% otu_table%>% make_frequency_count_table()
bN1 <- breakaway(psN1)
plot(bN1)
What am I supposed to make of this? Basically the UB dwarfs the actual values and I can hardly see differences. Normal ggplot transformations don’t seem to help too much.
bN1
A collection of 21 alpha diversity estimates:
$ Sample-57
Estimate of richness from method breakaway:
Estimate is 151
with standard error 32.13
Confidence interval: (128, 1214)
$ Sample-60
Estimate of richness from method breakaway:
Estimate is 230
with standard error 26.38
Confidence interval: (153, 1691)
$ Sample-62
Estimate of richness from method breakaway:
Estimate is 279
with standard error 70.08
Confidence interval: (171, 5250)
Cutoff: 13
$ Sample-63
Estimate of richness from method breakaway:
Estimate is 195
with standard error 7.99
Confidence interval: (174, 391)
Cutoff: 12
$ Sample-66
Estimate of richness from method breakaway:
Estimate is 237
with standard error 99.95
Confidence interval: (159, 6052)
$ Sample-67
Estimate of richness from method breakaway:
Estimate is 441
with standard error 154.89
Confidence interval: (255, 14442)
Cutoff: 16
$ Sample-72
Estimate of richness from method breakaway:
Estimate is 337
with standard error 108.36
Confidence interval: (243, 7276)
$ Sample-73
Estimate of richness from method breakaway:
Estimate is 296
with standard error 40.71
Confidence interval: (222, 2653)
$ Sample-75
Estimate of richness from method breakaway:
Estimate is 191
with standard error 47.69
Confidence interval: (111, 3077)
Cutoff: 11
$ Sample-76
Estimate of richness from method breakaway:
Estimate is 411
with standard error 124.61
Confidence interval: (227, 11790)
Cutoff: 16
$ Sample-78
Estimate of richness from method breakaway:
Estimate is 437
with standard error 200.77
Confidence interval: (214, 19854)
Cutoff: 11
$ Sample-81
Estimate of richness from method breakaway:
Estimate is 347
with standard error 250.1
Confidence interval: (221, 16987)
$ Sample-82
Estimate of richness from method breakaway:
Estimate is 251
with standard error 26.38
Confidence interval: (194, 1549)
Cutoff: 15
$ Sample-84
Estimate of richness from method breakaway:
Estimate is 293
with standard error 65.19
Confidence interval: (237, 3606)
$ Sample-86
Estimate of richness from method breakaway:
Estimate is 427
with standard error 131.81
Confidence interval: (304, 10038)
$ Sample-87
Estimate of richness from method breakaway:
Estimate is 381
with standard error 222.1
Confidence interval: (246, 16117)
$ Sample-89
Estimate of richness from method breakaway:
Estimate is 367
with standard error 71.41
Confidence interval: (232, 6034)
Cutoff: 14
$ Sample-91
Estimate of richness from method breakaway:
Estimate is 337
with standard error 293.43
Confidence interval: (204, 19770)
$ Sample-92
Estimate of richness from method breakaway:
Estimate is 188
with standard error 35.39
Confidence interval: (125, 2065)
Cutoff: 13
$ Sample-93
Estimate of richness from method breakaway:
Estimate is 268
with standard error 43.71
Confidence interval: (227, 2182)
$ Sample-98
Estimate of richness from method breakaway:
Estimate is 440
with standard error 238.35
Confidence interval: (239, 21560)
btN1 <- betta(summary(bN1)$estimate,
summary(bN1)$error,
make_design_matrix(psN1, "IgG_Con.6.gp120.B_Month_12")
)
btN1
$table
Estimates Standard Errors p-values
(Intercept) 2.471903e+02 17.66801201 0.000
predictors 1.217968e-03 0.01284556 0.924
$cov
(Intercept) predictors
(Intercept) 808.0871764 -0.4602607047
predictors -0.4602607 0.0004271582
$ssq_u
[1] 2285.535
$homogeneity
[1] 7.119130e+01 5.823446e-08
$global
[1] 197.8309 0.0000
$blups
[1] 181.5085 234.7282 258.2562 196.2611 245.3279 264.8319 262.9598 276.2470 218.7954 270.4313 257.4898 250.9070 250.6854
[14] 263.7305 268.3523 253.7315 284.6785 250.7703 208.7303 259.0121 256.2835
gp41mtx <- cbind(1,
psN1 %>% sample_data %>% as.matrix %>% as.data.frame %>% pull(IgG_Con.6.gp120.B_Month_6.5) %>% as.character %>% as.numeric %>% medcode
)
names(gp41mtx) = c("(Intercept)", "predictors")
btNmonth6gp120 <- betta(summary(bN1)$estimate,
summary(bN1)$error,
gp41mtx
)
btNmonth6gp120
$table
Estimates Standard Errors p-values
[1,] 234.54417 16.85354 0.000
[2,] 21.03933 22.41648 0.348
$cov
[,1] [,2]
[1,] 653.359 -653.359
[2,] -653.359 1155.858
$ssq_u
[1] 1942.916
$homogeneity
[1] 5.219889e+01 6.177498e-05
$global
[1] 214.1934 0.0000
$blups
[1] 187.3018 236.6914 247.2350 196.0839 234.8694 269.5101 267.1269 277.4026 214.2708 272.8225 243.8547 237.9443 252.1952
[14] 267.4423 253.8635 240.0967 271.2246 257.3775 205.9923 261.8624 261.6877
Does MDS1 associate with richness?
btN_MDS <- betta(summary(bN1)$estimate,
summary(bN1)$error,
make_design_matrix(psN1, "MDS1")
)
btN_MDS
$table
Estimates Standard Errors p-values
(Intercept) 243.9720 16.82711 0.000
predictors -31.7198 27.31450 0.246
$cov
(Intercept) predictors
(Intercept) 287.4724 57.2095
predictors 57.2095 757.4669
$ssq_u
[1] 1932.271
$homogeneity
[1] 6.250059e+01 1.546382e-06
$global
[1] 215.6907 0.0000
$blups
[1] 190.9549 235.6741 241.4954 196.3625 222.4246 270.5355 262.2978 280.2053 194.4011 282.1363 231.4248 227.0622 253.5892
[14] 260.3696 266.7135 248.6283 296.0946 232.9051 201.2419 264.0743 264.8205
Not in any way that is statistically significant.